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ABSTRACT 

Efforts of the Department of Education (ED) to 
simplify the Pell Grant formula by reducing the number of data 
elements used to calculate awards (i.e., data element reduction) are 
evaluated. A framework is d^ /eloped to assess the critical 
characteristics of individual data elements, to liminate elements 
from the fornula, and to develop proposals for data element 
reduction. Individual data items used in the Pell eligibility and 
award formulae are evaluated on the basis of five measures: budget 
impact, aggregate distributional impact, sensitivity, reliability, 
and verif iability. Included is a comparison of two simulations of a 
reduction in the number of data elements used in the Pell eligibility 
and award formulae. The two simulations, the standard and the error 
free simulations, are identical except for the data base used. Both 
simulations eliminate all but five data elements (adjusted gross 
income, federal taxes paid, nontaxable income, number in household, 
and number in postsecondary education). The applicant-based model and 
data base, and techniques used to adjust the ED applicant data base 
for the error patterns found in the Pell Stage III data, are 
described in appendixes. References and a substantial series of data 
tables are also appended. (i>W) 
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INTRODUCTION 

The Department of Education (ED) has been interested in simplifying the Pell 
Grant formula by reducing the number of data elements used to calculate awards. 
This endeavor is commonly called data element reduction. Three overarching 
objectives motivate ED*s approach to data element reduction. A reduced Pell formula 
must: 

• Maintain or enhance the ability of the program to efficiently identify the 
target population, 

• Simplify, streamline and make more understandable the determination of 
program eligibility and resulting awards, and 

• Reduce the program distortions associated with error-prone, difficult to 
verify data elements. 

Any data element proposal is also subject to tlie following constraints: 

• Minimize the redistributional effects caused by data element reduction, 
and 

• Neutralize the potential budgetary impact. 

These objectives are not easily achieved. In fact, past attempts to eliminate 
data elements from the Pell formula have faltered because policymakers have been 
unable to demonstrate that these objectives could be achieved subject to the 
constraints identified. 

Past analyses of reduced Pell formulae have assumed that eliminating 
infrequently reported data elements to increase efficiency automatically decreased 
equity by adversely affecting the awards of groups of recipients (e.g. those with high 
medical/dental expenses). The current analysis suggests that data elements placed in 
the Pell formula to enhance equity may actually undermine equity by introducing 
reporting error that distorts award patterns. These data elements may not have their 
intended effects on targeted recipients and their elimination may actually increase 
equity. Thus, a reduced Pell formula could achieve both efficiency and equity without 
massive distortions to awards for the vast majority of recipients. 



The current analysis of data element reduction uses an approach that is 
fundamentally different from past analyses. A framework was developed to assess the 
critical characteristics of individual data elements and rank them under known 
assumptions. The framework allows one to select elements to eliminate from the 
formula and, thus, alternative data element reduction proposals can be developed for 
analysis and comparison. One recent proposal for a five element formula is discussed 
in-depth in Chapter 3 of this report. 



SUMMARY OF HNDINGS 



The analysis has produced many useful findings concerning data element reduc- 
tion, including: 



• The analytic framework used in thii emalysis can be a powerful tool for 
developing rational, de^^nsible data element reduction proposals. 

• Pell Grant data elements can be ranked in an objective, value-free manner 
according to their impact on the program. 

• Data elements can be identified for letention in the formula or elimination 
on the basis of this ranking. 

• The analysis of the five data element Pell formula with a standard and an 
'*error free" data base suggests almost identical patterns in individual 
awards: 

few recipients lose large amounts (over $400-$300) 

the neediest students, those receiving the highest awards, continue to 
receive high awards (98 percent receive within $200 of the maximum 
award) 

a disproportionate number of recipients who lose eligibility received 
low awards ($500 or less under the full formula) 

• The cost estimates using EO*s standard data base, which contains reporting 
errors, must stand as official estimates of the likely cost of data element 
reduction. However, a comparison of the cost estimates produced by the 
standard and error free simulations provides a potential budget range for a 
five element formula ($2.6 billion using standard data, $2.4 billion using 
error free data). 

• The analysis suggests that increased costs incurred by reducing the formula 
tc five elements could be potentially "financed" simply by eliminating error 
from the remaining elements, rather than adjusting formula taxation rates 
upward. 
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More specific findings from both the assessment of individual data elements and 
the analysis of a five element Pell formula follow. 

Assessment of Individual Elements 

The assessment of thv* impact of indiv.iual data elements has demonstrated that 
this analytic framework is both an aporo^riate and effective policy tool. The 
framework has provided a means for systen atically evaluating; and ranking 17 data 
elements in the Pell eligibility and award formulae across five measures. The 
framework provides a means of integrating Tiese discrete measures (budgetary and 
distributional impact, sensitivity, reliability anc verifiability). 

We have provided two examples of how sue 1 an integration can be conducted and 
demonstrated how the results of these examples can inform policymakers in their 
consideration of data element reduction. In the f rst example, using equal weights for 
all measures, we ranked the data elements and cla, sified them into three groups: high 
(high rankings on most measures), moderate (mixei rankings on these measures), and 
low (low rankings on most measures). 

The data items were classified in the example as follows: 
High 

• Adjusted Gross Income 

• Social Security Education Benefits 

• U.S. Taxes Paid 

• Family Size Offset 

• Employment Expense Offset 
Moderate 

• Net Home Equity 

• Number in College 

• Nontaxable Income 

• Veteran's Education Benefits 

• Elementary and Secondary Tuition 
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Low 




• 


Dependent Student's Net Assets 


• 


Net Investment Equity 


• 


Dependent Student's Income 


• 


Net Business/Farm Equity 


• 


Student Marital Status 


• 


Cash/Savings/Checking 


• 


Unusual Medical/Dental Expenses 



The example generally suggests that the data items in the low classification 
could be considered for elimination from the Pell formulae with minimum impact 
across the five measures (budgetary and distributional impacts, sensitivity, reliability, 
and verlf lability). Those classified as moderate would require closer scrutiny and 
would have higher impact. Those classified as high, for all practical purposes, could 
not be eliminated without substantial impact to the p. am. An example using 
differential weights for the measures resulted in two changes in the rankings and no 
changes to the classifications. 

The discussion above is only a summary of the examples. The results of these 
must be put into the cpntext provided by the thorough discussion of the analysis, 
findings, and the caveats provided in Chapter 2. 

Analysis of a Five Element Formula 

As Chapter 2 presents a methodology and data for developing data element 
reduction proposals. Chapter 3 presents a detailed and thorough analysis of the 
budgetary and distributional impact of one data element reduction proposal, a five 
element formula. Two simulations, conducted for Advanced Technology by the 
Division of Policy and Program Development (DPPD), Office of Student Financial 
Assistance, formed the basis of the analysis. The first simulation used a standard 
applicant data base in conducting model runs of full and five element formulae. The 
second simulation was identical to the first except that an "error free" data base was 
used to simulate the effect of eliminating error along with data elements. (A 
description of the imputation procedures used to develop this unique data base is 
contained in Technical Appendix B.) A comparison of the two simulations produced 
Q the following findings: q 
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• Differences in impact are most evident on the aggregate level of program 
costs and number of recipients* 

• The error free simulation results in nearly 150,000 fewer recipients and a 
slightly higher budgetary impact than the standard simulation. However, 
the baseline budget was substantially lower (about $200 million) for th'^ 
error free simulation. 

• The error free simulation produces a lower baseline budget (about $2.2 
billion) and the five element formula without taxation increases roughly 
equals the standard simulation full formula baseline costs (about $2.4 
billion). This calls into question the need to increase taxation rates in the 
simulation. 

• Average awards for the error free simulation are unchanged but lower than 
the standard simulation, in which awards decline. 

• On most other dimensions (e.g., numbers of awards increasing, decreasing, 
or staying the same by applicant characteristic) the differences are 
minimal. 

These findings and the analysis of the simulation are discussed in detail in 
Chapter 3. 

BACKGROUND 

Discussions surrounding the number and type of data elements used in deter- 
mining eligibility and award for the Pell Grant program are as long-standing as the 
program itself. These discussions typically have focused on several major policy- 
relevant issues including the program costs for different combinations of data 
elements, the sensitivity of different formulae to specific groups of applicants, and 
the redistributive effects of adding or eliminating data elements. In addition, the 
relationship of the Pell formula to the overall student aid delivery system has been a 
concomitant issue. 

Recently, the findings of the Pell Grant Quality Control (QC) Project have 
resurfaced data element reduction as a potential corrective action which could lower 
program-wide error through eliminating error-prone data elements from the Pell SAl 
and award formulae, and simplify the application process as v;ell. The Pell Grant QC 
o ' 



Project measured quality in the delivery of funds in the Pell Grant Program. Using a 
variety of data collection methods, including institutional site visits, record 
abstractions, personal interviews with parents and students, and acquisition of IRS 
records, the project recomputed awards based on the most reliable data and then with 
original awards and institutional disbursements. The results of the project were 
twofold. First, the analyses generated program-wide estimates of errors; second, 
these analyses identified data elements in the SAI and award formulae that were 
error-prone and difficult to validate. Consequently, as part of the Title IV Quality 
Control Project, the Division of Quality Assurance (DQA) has identified Pell Grant 
data element reduction as a potential corrective action to reduce errors and has 
requested a series ol analyses to support ED policymakers in the renewed policy 
discussion surrounding data element reduction. 

Numerous analyses of data element reduction have been undertaken in recent 
years. Most have focused on the budgetary impact of reduction and the alteration of 
the award patterns that exist under the current formula, which are most often used as 
a measure of program equity. However, none of these analyses was able to analyze 
fully the impact of data element reduction for at least two reasons. First, most 
previous analyses assumed that reported application data were correct and hence 
failed to capture the effects of er^or on the program. Second, none of these recent 
analyses was able to systematically evaluate the impact of data elements across 
several diverse program goals. 

Program-wide analyses of several combinations of data elements in a reduced 

eligibility formula conducted by Advanced Technology during Stage II of the Pell QC 

Project accounted for error by using verified data in the simulations.^ Despite 

controlling i^' applicant error for the first time, these analyses were cond^icte^^ on a 

recipient data base and therefore the impacts of these alternative combinations on 

newly eligible recipients could only be estimated. As a part of the present policy 

option, preliminary a-^alyses were conducted to measure the program-wide effects of 

2 

data element reduction at a detailed level. These analyses utilized data from the 
official ED applicant-based model, with the assistance of the Pell Grant Branch, 



Compilation of Quality Control Findings; Information on Policy Options , March 

1983. 

2 

Title IV Quality Control Policy Optiont Preliminary Analysis of a Simulated 
Five Data Element Pel! Grant Eligibility Formula , September 198'^. 
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DPPD| to measure the effects of data element reduction on subpopulations of 
applicants. While these data brought the strengths of an applicant data base to the 
analyses, the analyses could not account for application error, a major source of 
program error. However, the findings from the 1932-83 Pell Grant QC Project allow 
substitution of more accurate data for error-prone data elements through the creation 
of an adjusted applicant data base and measurement of the effects of data element 
reduction on the pattern of awards. This provides a more accurate basis for comparing 
distributions of awards under the full and reduced data element formulae. Both the 
preliminary and the present analyses of full and reduced formulae hold the budget 
constant by adjusting upward the taxation rates. 

Another approach to data element reduction was proposed by Advcinced Tech- 
nology. An informal position paper presei.^ed a framework for systematically 
evaluating the impact of individual data elements. The Stage III Corrective Actions 
volume from the Pell QC Project utilized this framework and presented an approxi- 
mation of the impact of each element across five criteria, using Stage III Pell 
recipient data. 

This policy option report represents an integration of the approaches from 
several prior analyses and benefits from the strengths of each. The analysis has two 
discrete parts. The first, which was recommended in the Stage III Corrective Actions 
volume, assesses the impact of individual data elements on five program dimensions: 

« Budgetary Impact 

• Aggregate Distributional Impact 
a Sensitivity 

• Reliability 

• Verifiability 



Quality in the Pell Grant Delivery System, Volume 2, Corrective Actions , April 
1984, pp. *-8 through *-13. 
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These dimensiorts and the assessment methodology are described in Chapter 2 of this 
report. 

The second analysis compares distributional trends resulting from program*wide 
simulations of the applicant-based model for the full formula used for the 1982-83 
academic year with a five element formula using both reported data (those containing 
error) and error adjusted or "best" data (from which error found in the Pell QC Stage 
III has been corrected). Chapter 3 contains this analysis.* 

ANALYTIC CONTEXT 

The nature and focus ot the analysis conducted for this policy option report must 
be carefully delineated and explicitly contrasted with policymaking. Both analy .es— 
the program-wide simulation of the full and five element formulae ar^d the assessment 
of the impact of individual data elements— have been designed to provide data with 
which ED policymakers can make informed policy decisions. We have avoided making 
implicit policy decisions throughout our analysis, ^or example, the goal of assessing 
individual data elements is to provide policymakers with a framework for ranking data 
elements according to their impact, not to advance any one proposal within this paper. 
Nevertheless, analysis such as this requires making judgments in order to provide data 
to ED for policymaking purposes. We have clearly identified points at which 
judgments were made and explicitly stated these judgments. 

in addition, the policy relevance of the findings must be delineated carefully, 
particularly with regard to simulating the program-wide effects of reducing the 
number of data elements in the Pell eligibility and award formula to five. The analysis 
has been designed as an evaluation, not as a forecast. The emphasis of ^he assessment 
of individual data elements is the measurement of the impact of data elements across 
several dimensions. Therefore, the findings from both analyses can isolate the effects 
of data element reduction within a research context; only official ED estimates can 
stand as forecasts of likely policy consequences. 

Some general comments should be offered concerning the data base, simulations 
and generalizability of the results of our analyses. These simulations utilize a large 



Technical Appendix A contains descriptions of the ED model, applicant data 
base, and the full and five element formulae simulations. 



ERIC 



.8- 13 



data base that permits generalization to the popu: *'on of applicants. Different 
eligibili y criteria, however, are likely to change the composition of the applicant 
populatii ^. We were unable to account for this likelihood in this analysis, since the 
model an ? our analyses simulate the effects of program changes on an existing and 
static app leant population. Also, the results of the assessment of individual data 
elements a*e, to a degree, formula specific, although some of the results would be 
identical. The degree of difference between the formula used and another— a 
subsequent y *ar or reduced form— must be examine^J and considered before generaliza- 
tions could bt considered. This analysis focuses explicitly on the impact of eliminating 
data element, from the eligibility and award formulae. It does not assess the 
implications o: eliminating items from the application form nor does it deal with 
issues of compa ability with other need analysis tests or forms. Although thest are 
important considtTations, they are beyond the scope of this analysis. 

This analysis can play the important role of informing the policy debate by 
measuring the efficiency of data element reduction as a corrective action for program 
error by accurately and comprehensively capturing its effects. The assessment of 
individual data elem ^nts can also serve as a basis for developing alternative proposals 
for altering the number and types of data elements used in the determination of 
eligibility and award. 

ORGANIZATION OF Th 3 REPORT 

This report is comprised of two chapters that parallel the analysis and technical 
appendices. Chapter 2 describes the analysis and findings resulting from the 
evaluation of the marginal impact of the individual data elements. Chapter 3 
compares two simulations o- a reduction in the number of data elements used in the 
Pell eligibility and award for Tiulae using two data bases. The Appendic<^s describe the 
data base and model, the imputation that was conducted to adjust the ED applicant 
data base for the error patterns found in the Pell Stage III data, and additional 
program simulation tables. 
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2 

EVALUATION OF THE IMPACT OF INDIVIDUAL DATA ITEMS 



Characteristically, data element reduction has been approached by presenting 
alternative configurations of eligibility formulae with five, six, or seven data elements 
or substituting number of exemptions for household size. These alternatives have then 
been evaluated by measuring changes to the budget and the distribution of awards at 
the program level induced by changing the formula. Despite the intuitive appeal and 
relative ease of such an approach, these analyses have failed to provide either a 
framework or the daca for systematically developing and evaluating alternatives. In 
addition, the development and evaluation of data element reduction alternatives are 
subject to competing, if not conflicting, goals which most approaches cannot deal with 
easily. 

Data element reduction most often has been advanced as a strategy to maximize 
two of these program goals: integrity and efficiency. Integrity is maximized by 
making mic program less error prone and increasing the reliability of data collected. 
Efficiency is achieved by reducing applicant data burden, administrative costs to 
institutions and application processing costs to the government. However, past 
reduction proposals have run afoul of budget and equity concerns. Analyses of data 
element reduction proposals have suggested that these proposals cause budget 
increases and shifts in distribution of awards that were judged to be unacceptable and 
resulted in decreased program sensitivity to applicant characteristics. Prior policy 
discussions have not provided the framework or data with which to consider these 
goals simultaneously. 

The current approach provides both the framework and the data with which to 
make informed judgments about alternative configurations of data elements. This 
approach pr rades these by evaluating each data element individually on the basis of 
five measures: 

15 
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t Budget Impact 

• Aggregate Distributional Impact 

• Sensitivity 

• Reliability 

• Verifiability 

The approach also ranks the data elements for each measure ordinally from the highest 
to the lowest impact. 

This approach also allows for simultaneous consideration of these measures in 
order to enable policymakers to identify groups of items that must remain in the 
formulae, those that can be eliminated with little impact, and those that could be 
eliminated given certain tradeoffs. An underlying premise of the analysis suggests 
that items that rank low on all measures more easily could be eliminated, whereas 
high-ranking items should be retained. 

METHODOLOGY AND ANALYSIS 

The focus of the analysis in this portion of the report is the evaluation of data 
Items used in the eligibility and award formulae as they directly affect the award. For 
the most part, these data elements correspond with a single formula item.^ 

Each item was evaluated individually changing to zero all non-zero reported data 
values for the item being evaluated, such as net home value or unusual medical and 
dental expenses. Table 1 lists the values used to eliminate the item from the formula. 
All awards were then recalculated and analyz^^d for each of the five measures. For 
one item, family size offset, changes to the SAI software were necessary in order to 
eliminate the data item. 

Measures and Database 

In this portion of the analysis five measures are used to assess the impact of 
individual data elements on awards. In order to assess this impact we used the 1982-83 




^Two exceptions are Family Size and Marital Status which affect multiple 
formula elements. 
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TABLE 1 



DATA ITEMS EVALUATED THROUGH 
ELIMINAHON FROM THE PELL ELIGIBILITY AND 
AWARD FORMULAE 

Value Uied to 
Eliminate the 



Data Item Data Item 



Income 

Adjusted Gross Income 0 

Nontaxable Income 0 

U.S. Taxes Paid 0 

Dependent Student's Income 0 

Veteran's Education Benefits 0 

Social Security Educdtion Benefits 0 

Assets 

Net Home Equity 0 

Net Investment Equity 0 

Cash/Savings/Checking 0 

Net Business/Farm Equity 0 

Dependent Student's Net Assets 0 

Offsets and Protections 

Student's Marital Status Unmarried 

Fimily Size Offset 0 

Number in College 1 

Unusual Medical and Dental Expenses 0 

Elementary and Secondary Tuition and Fees 0 

Employment Expense Offset 0 
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ED data base and a standard full formula for the 19S2-S3 program year as a baseline. 
Individual data elements were removed from the formula and awards were recompu . d 
using the 1982*83 Pell eligibility and award formulae. The resulting awards were 
multiplied by a sampling weight assigned to each applicant on the file and, from the 
first two measures, by a participation rate assigned by income level. These procedures 
estimate program changes attributable to the elimination of the data element. The 
changes were then analyzed through the five measures, each of which is described 
below. 

• Budgetary Impact is the change in program budget when a data element is 
excluded and the resulting budget is compared with the baseline budget 
under a full formula. 

• Aggregate Distributional Impact is measured as the change in the 
distribution of program funds across income and other categories compared 
against the baseline distribution with all elements included in the formula. 

• Sensitivity is a measure of the relative responsiveness of the program to 
applicants with particular characteristics (e.g., two working parents). 
Sensitivity is reported as the average change between the base award and 
the recomputed award with the data item removed. 

• Reliability is the degree to which reported data accurately represent 
applicants* true characteristics. 

• Veriiiability is an assessment of the degree to which items can be checked 
against reliable corroborative data sources. 

The framework utilized requires that we make judgments concerning several 
analytic issues including classification and weighting. In each of the analyses, data 
elements are classified as having high, moderate, or low impact. The basis upon which 
data elements were assigned to these categories is explicitly treated in each of the 
following sections. In ilie last section of this chapter, the results of the five analyses 
are integrated. Although we have included two examples of weighting schemes, the 
values we assigned to the classifications in order to rank the data items (2, 1, 0 for 
high, medium, low) remain constant. The use of different values (for example, 5, 1,0, 
respectively) may alter the ranking and potentially the classification. 

The remainder of this chapter is divided into sections that describe the analysis 
conducted for each of these measures and the findings of these analyses. Each 
measure addresses a specific research question that introduces the sections. 
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Budgetary Impact 



One of the primary and often asked questions concerning the effects of data 
element reduction is the impact on the program budget. This portion of our analysis 
was motivated by the following question: How does the program budget change when 
single data elements are removed from the Pell formulae? Within this framework, 
data elements that had high budgetary impact would likely be retained in the formula; 
those with low budgetary impact would be candidates for elimination on the basis of 
budgetary impact. 



In order to address this question, we eliminated each of the 17 data Items in turn 
and recomputed awards for cases in which changes to the data element were made and 
summed all weighted awards. The result was a new program budget total. The 
difference between the baseline budget and the new budget is defined as the budgetary 
impact, represented as a dollar difference and percentage change. Table 2 represents 
the ranking of the budgetary impact of removing individual data elements. 1 he data 
elements are ranked from highest to lowest percent absolute change. In addition these 
budgetary changes are classified as high, moderate or low according to the following 
ranges: 

• High — more than 10 percent change in program cost (approximately $250 
million) 

• Moderate — 2 to 10 percent change in program costs (approximately $50 to 
$250 mUlion) 

• Low — less than 2 percent change in program costs (approximately $50 
million or less) 



Several features of Table 2 are noteworthy. Eliminating data elements produces both 
positive and negative changes. Increases in budget result from eliminating income or 
asset items that are used as resources for family contribution to educational costs. 
Conversely, decreases in budget result from eliminating expense allowances that 
protect portions of income from contribution. Adjusted gross income, family size, and 
social security education have the greatest budgetary impact, although the changes 
are both positive and negative. Adjusted gross income, family size, and social security 
education benefits have the greatest budgetary impact, although the changes are both 
positive and negative. Seven data items (VA education benefits, elementary and 
secondary tuition, investment equity, business farm equity, cash/savings, student's 
marital status and medical/dental expenses) affect program 



TABLE 2 



RANKING OF THE BUDGETARY IMPACT OF ELIMINATING 
DATA ELEMENTS FROM THE ELIGIBILITY AND AWARD FORMULAE 
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68.66 


S 
O 
H 


ramiiy oi2C wxisci 






X 


Sn€*ial S^curitv Pdiication Benefits* 


276 


11.10 




U.S. Taxes Paid 


-155 


-6.23 






1 17 


U 72 


u 


iNumoer in v^ouege 


f to 






rNoniaXaDic inuornc 






u 

i 


crnpioyrnenx expense vjiisei 








ucpenucni oiuucni s inuumc 


71 

/ 1 






Dependent Student's Net Assets 


35 


1.39 




Veteran's Education Benefits 


13 


0.J3 




Elementary and Secondary Tuition 


-13 


-0.53 




Net Investment Equity 


10 


0.39 




Net Business/Farm Equity 


8 


0.3'f^ 


1 


Cash/Savings/Checking 


8 


0.30^ 




Student's Marital Status 


5 


0.21 




Unusual Medical/Dental Expenses 


-2 


-0.08 



^Baseline Budget is $2,488 million. 

The Pell formula no longer contains social security education benefits. It is not 
possible in this analysis to estimate with any accuracy the impact of eliminating this 
data element from different formulae. However, the effects are not likely to 
challenge the findings of this analysis. 

Difference due to rounding. 
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costs by iess than 1 percent. Several of these items, the asset items, are subject to 
$25,000 protections and are, for most applicants, "taxed" at 5 percent, effectively 
reducing the budgetary impact of these items. Relatively few applicants report 
tuition expenses or levels of medical expenses high enough (greater than 20 percent of 
effective family income) to reduce family discretionary income. 

This analysis uncovers an interesting, seemingly anomalous, finding relating to 
the difference between the impact of social security and veteran's education benefits. 
Both of these elements are included in the award formula, which means that they more 
directly affect Pell awards than other elements in the SAI formula that are taxed cr 
subject to protections or offsets. However, the budgetary impact of VA education 
benefits is vastly lower than social security education benefits* This is a result of the 
fact that far fewer (about 2 percent) report receiving VA benefits as opposed to social 
security (about 11 percent). The mean value for VA benefits ($3,200) is also slightly 
more than half the mean value for social security ($5,390). These two facts result in a 
substantially lower budgetary impact for VA benefits. This, of course, is to be 
expected. Items that were infrequently reported or had low effective values tended to 
have low budgetary impact. 

Aggregate Distributional Impact 

The impact on the distribution of awards resulting from changes to the eligibility 
and award formulae is of fundamental importance to any analysis on the impact of 
data elements. Particularly since the impetus for data element reduction is the 
reduction of error, rather than redirecting program funds, the elimination of data 
elements from the formulae must have as a constraint minimizing redistributive 
effects induced by these changes. Therefore, a particularly relevant question for this 
analysis is: What is the impact on the distribution of awards of eliminating each of the 
17 data elements? Data elements that have high redistributional impact on program 
funds would likely be retained; those that have low redistributional effects would be 
candidates for elimination. 

This distributional analysis was conducted by comparing the applicant's original 
award under the full formula with the award when the respective data element was 
removed from the formula. The results of these comparisons, for presentational 
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purposes, were tabulated by percentage of applicants who experienced no change in 
award (+/-$100) and two levels of increases and decreases ($101-$600 and over $600) 
and ranked from highest to lowest impact* Those data items that induced the largest 
number of increased and/or decreased awards were ranked as having the highest 
distributional impact. Conver:»elyy the data items that cause the fewest changes in 
awards were ranked as low impac:. 

Table 3 presents the results of this distributional analysis and an ordinal ranking 
of the distributional impact of each individual data element. In addition, the 
distributional effects are classified as highi moderate, or low in the following manner: 

• High — Greater than 10 percent of the applicants would receive a different 
award (different by more than $100) when compared with the original 
award. 

• Moderate ~ Greater than 5 percent but less than 10 percent of the 
applicants would receive a different award (different by more than $100) 
when compared with the original award. 

• Low — Less than 5 percent of the applicants would receive a different 
award (different by more than $100) when compared with the original 
award. 

Several conclusions can be drawn from the table about the distributional impact 
of individual data elements. Only three data elements cause redistribution for more 
than ten percent of all applicants (family size, adjusted gross income and U.S. taxes 
paid) and therefore could be considered to have high impact. Four more data elements 
can be classified as having moderate impact, causing redistribution in between five 
and ten percent. Ten data elements have a redistributive impact for less than five 
percent and are considered to have low impact. Six of these 10 low impact data 
elements cause redistribution for less than one percent of all applicants. 



The preceeding measures assess the impact of eliminating data elements at a 
program-wide or aggregate level. Although this assessment is fundamental to any 
analysis of changes to the Pell formulae, other dimensions of the impact cannot be 
overlooked, including the effects of the change in awards of individual applicants. 



Sensitivity 
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TABLE 3 



RANKING OF THE IMPACT ON DISTRIBUTION OF AWARDS OF ELIMINATING INDIVIDUAL 
DATA ELEMENTS FROM THE ELIGIBILITY AND AWARD FORMULAE 
RANKED FROM HIGHEST TO LOWEST IMPACT 



Increase No Change Occrcaic 

Over $400 $101 to $400 (♦/. 100) $101 to $400 Over $400 

Impact Data Element Eliminated (%) (%» (%) (%) W 





Family Size Offset 


0 


0 


f»9.(»l 


19.89 


30.70 


X 
19 


Adjusted Gross Income 


32.24 


r».H4 


57.87 


0 


0 


X 


U.S. Taxes Paid 


0 


0 


85.19 


1(1.64 


0.15 


Ui 


Employment Expense Offset * 


0 


0 


91.11 


8.82 


U.07 




Number in College 


0 


0 


91.90 


7.17 


0.9J 


UJ 

o 


Social Security Education Benefits 


4.08 


1.93 


91.99 


0 


0 




Net Home Equity 


I.H2 


4.70 


93.(17 


0 


0 




Nontaxable Income 


i.kO 


3.1V 


95. (»4 


0 


0 




Dependent Student*s liurome 


1.38 


1.59 


97.03 


0 


0 




bv|i;tndent Student*s Net Assets 


U.24 


2.57 


97.20 


0 


u 




Elementary and Secondary Tuition 


0 


U 


98.74 


1.4)0 


0.0(1 




Vcteran^s Education benefits 


0.27 


U.V7 


99.27 


0 


0 


O 


Student's Marital Status 


U 


0.43 


99.29 


0.0(1 


0.0(1 


—1 


Cash/Savings/Cliecking 


U.07 


0.50 


99. V3 


0 


0 




Net Real Estate/lnvesfinent Equity 


0.16 


0.32 


99.53 


0 


u 




Net Busii>ess/Fdrin E(|uity 


U.I4 


O.ll 


99.72 


U 


0 




Unusual Medical aiul Dental Expenses 


U 


0 


99.88 


0.09 


0.02 



*Not an application item, computed from income portions. 
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Therefore, this analysis explored another research question: How individual awards 
change for applicants facing particular circumstances when a data element, included 
in the formula to sensitize the award to such circumstances, is removed? 

Elimination of data elements from the formulae can have a substantial effect on 
the sensitivity of the formulae to specific groups of applicants, an important 
component of equity. Equity, as it is used in this context, can be stated simply as 
equal treatment of equals. The Pell formulae (eligibility and award) have many 
components that potentially enhance sensitivity— the c'^ility to account for differences 
among applicants—and thereby equity. 

Elimination of data elements can decrease sensitivity by reducing the ability to 
differentiate among applicants. In addition, elimination of certain data elements will 
affect sensitivity to a greater degree than others. For example, the elimination of the 
family size offset would certainly have a greater impact on sensit? ity than the 
elimination of medical/dental expenses, since the former decreases discretionary 
income by approximately $1,200 for each additional family member from a base of 
$*,200 and the latter reduces discretionary income by the amount of expenses in 
excess of 20 percent of effective income (all income minus taxes). Those data 
elements that are included in the formula to enhance sensitivity but have little impact 
on awards— even for applicants at the upper ranges of the data value— would be 
candidates for elimination on the basis of sensitivity. 

We have measured the impact on sensitivity of awards to the individual data 
elements by identifying the upper range of data values,^ eliminating the value and 
recomputing the award for this subsample of cases. Table * lists the data values for 
these ranges. The upper range of each value was selected because the elimination of 
the data elements would show the greatest impact at that level. 



The range selected for most data elements was the 90th and 95th percentile. 
This measures the maximum impact of the data element on the award while avoiding 
biasing the measure by including outliers. For several data items (elementary and 
secondary tuition, net business/farm equity, net investment equity and veteran's 
educational benefits) the values between the 90th and 95th percentile were zero, 
consequently we measured award changes for values between the 95th and 99th 
percentile. 



TABLE * 



VALUES FOR DATA ELEMENTS USED 
IN THE SENSITIVITY ANALYSIS 

Range of Data Values^ 

Low High 



Adjusted Grosj. Income 


23, Oil* 




Social Security Education Benefits 


1,005 


if, 963 


Net Home Equity 


3S,220 


if9,879 


U.S. Taxes Paid 




5,351 


Family Size 


6 


7 


Employment Expense Offset 


1,500 


1,500 


Number in College 


2 


t* 


Nontaxable Income 


5,078 


7,932 


2 

Veteran's Eduction Benefits 


1 


^^,699 


2 

Elementary and Secondary Tuition 


563 


2,052 


Dependent Student's Net Assets 


159 


533 


2 

Net Investment Equity 


6,SS2 


'>0,U5 


Dependent Student's Income 


2,387 


3,69^^ 


Student Marital Status 


married 


married 


Cash/Savings/Checking 


3,001 


6,103 


Unusual Medical/Dental Expenses 


1,139 


1,629 


2 

Business/Farm Equity 




77,730 



All values are in the 90th to 95th percentile range unless otherwise noted. 

^These values are in the 95th to ?9th percentile range because the value of the 
percentile was zero. 



It should be noted that we measured sensitivity for ill data elements with the 
single exception of dependency status, which posed methodological problems* Clearly, 
the elimination of several of these, such as AGI, would not seriously be considered, 
since this would alter the fundamental nature of Pell as a need-based student aid 
program. Nevertheless, these elements were included in the analysis in order that the 
methodology be comprehensive, and the ranking of the elements be accurate. 

Table 5 presents the results of this analysis of sensitivity. The table ranks the 
data elements on the basis of absolute percent change in award. In addition, the 
sensitivity of the data element is classified as high, moderate, or low in to the 
following manner: 

• High — 50 percent or greater change in mean award 

• Moderate — 10 percent or greater but less than 50 percent change in mean 
award 

• Low — 10 percent or less change in mean award. 

Table 5 contains several columns: the base or original award, the marginal 
award recomputed with the respective data element eliminated, the change in award 
or difference between the two, and percent change in award. The change in award 
represents the sensitivity of the award to the data element measured in dollars. The 
percent change in award represents the change in award as a percentage of the mean 
baseline award. The data items are ranked on the basis of absolute percentage change 
in award from highest (AGI, 1,507 percent) to lowest (business/farm equity, .1 
percent), ignoring the direction of the change. Items were ranked by absolute change 
because it was assumed that increases and decreases have equal weight; that one is not 
preferential to the other from the perspective of sensitivity. The data in Table 5 
suggest that, given the methodology, awards are most sensitive to the high impact 
elements, including AGI, social security education benefits, net home equity, U.S. 
taxes paid, and family size. The relatively low mean baseline award for AGI ($81) 
results from the fact that few applicants with AGPs within the 90 to 95th percentile 
receive awards. Thus, the mear or average award is depressed by the large number of 
zero awards in that range of AGI values. When AGI is eliminated from the formula, 
awards increase dramatically, because of the nature of the formula. Awards have 
relatively high sensitivity to social security education benefits because these benefits 
directly reduce award 5ince it is part of the award formula. 
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TABLE 5 



SENSITIVITY OF AW/iRD TO THE ELIMINATION OF INDIVIDUAL 
DATA ELEMENTS DY DATA ELEMENT 



Sensitivity 



IS 



Data Item 



Adjusted Gross Income 

Social Security Education Benefits 

Net Home Equity 

U.S. Taxes Paid 

Family Size Offset 



Base 
Award 



Mean 



i 



81.54 
315.52 
171.84 

58.45 
606.15 



Margin^ 
Award 

1,310.4 
y28.54 
344.09 
8.85 
213.58 



Change 
in Award 

1,228.86 
613.02 
172.25 
-49.6 
-392.57 



Difference 

Percent Change 



In Award 



1,507.06 
194.29 
100.24 
-84.86 
-64.76 



Employment Expense Offset 89.74 

Number in College 579.27 

fc Nontaxable Income 569.52 

^ Veteran's Education benefits 676.96 

2 Elementary and Secondary Tuition 452.36 

Dependent Student's Net Assets 323.01 



65.41 -24.33 -27.11 

478.12 -101.15 -17.46 

647.70 78.18 13.73 

760.08 83.12 i2.28 

403.81 -48.55 -10.73 

356.58 33.57 IU.39 





Net Investment Equity 


270.97 


292.31 


21.34 


7.88 




Dependent Student's Income 


401.06 


425.69 


24.63 


6.14 




Student's Marital Status 


755.95 


769.03 


13.08 


1.73 


LOW 


Cash/Savings/Checking 


267.89 


271.36 


3.47 


1.3 




Unusual Medical/Dental Expenses 


335.53 


334.90 


-.63 


-.19 




Net ikisiness/Farm Equity 


603.24 


603.82 


.58 


.1 



Original award computed with all data elements. <j i 

ERXOard computed with the respective data element eliminated. 
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Awards are moderately sensitive to six data elements ranging from employment 
expense offset (-27 percent) to dependent student's net assets (10 percent), Awar'^s 
are relatively insensitive to another six elements. These range from net investment 
equity (8 percent) to business/farm equity (less than 1 percent). 

ReUabiUty 

Program integrity is a fundamental design and program goal. In fact, if the data 
collected are not accurate and reliable, other program goals are undermined. 
Consequently, the reliability of applicant data is a relevant, if not essential, 
component of any assessment of the impact of individual data elements. We addressed 
this dimension of the analysis by posing the question: How accurately does applicant 
reported data represent an applicants true characteristics? 

The reliability of data elements was assessed through the use of the Fell Grant 
Quality Control Project Stage HI data. We have defined reliability as the discrepancy 
rate found in Stage III. Two error rates were developed in this study: simple case 
discrepancy and case discrepancy with payment consequences. Case discrepancy 
occurs when true or validated data differ from application data used in the 
determination of Pell eligibility and award. Case discrepancy leads to payment 
consequences when the validated data result in a different award than calculated with 
original application data. Table 6 presents the discrepancy rates under both 
definitions and the ordered ranking for both. The data elements are ordered by case 
discrepancy rate. This rate was selected because it is more reliable since the other 
rate is formula specific and would change under a different formula. Thus, the former 
is more generalizable. 

Data elements are also classified into groups of high, moderate, and low 
reliability items. This classification is the obverse of the error rate: the lower the 
error rate, the higher the reliability. The classification is as follows: 

• High — Less than 5 percent cases discrepant 

• Moderate — 5 to 10 percent cases discrepant 

• Low Greater than 10 percent cases discrepant 

O .23- 
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TABLE 6 



RELIABILITY OF DATA ELEMENTS USED IN THE 
PELL GRANT FORMULAE RANKED FROM MOST TO LEAST RELIABLE 



Cases with 
Discrepancies 
Resulting 

Cases with in Payment 

Discrepancies Consequences 



Reliability Data items (%) Rank (%) Rank 

Business/Farm Equity' 1.0 1 .1 1 

jQ Veteran's Education Benefits 1 2 .6 3 

o I 

H Net Ii vestment Equity 2.1 3 ^3 2 

EC 

Elementary and Secondary Tuition 2.3 .7 



MODER- 
ATE 


Social Security Education Benefits 
Student's Marital Status 


5.2 
9.9 


5 
6 


2.6 
3.2 


9 
10 




Net Home Equity* 


10.7 


7 


1.8 


13 




U.S. Taxes Paid 


U.l 


8 


3.5 


10 




Number in College 




9 


>.9 


12 




Adjusted Gross Income 


16A 


10 




11 




2 

Employment Expense Offset 


17.7 


11 


1.5 2 


7 




Family Size Offset 


22.it 


12 


10.1 


15 




Unusual Medical/Dental Expenses 


23.2 


13 


.9 


6 




Nontaxable Income 


30.6 


14 


10.0 


14 


1 


Dependent Student's Assets 


35.1 


15 


18.1 


17 




Dependent Student's Income 


37.0 


16 


l«f.5 


16 




Cash/Savings/Checking 




17 


.8 


5 



I Estimate, computed from error rates for assets and debts. 



^ ^Estimate, computed from the error rate for income portions. 
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Four items in Table 6 have high reliability and their discrepancy rates and 
rankings are similar. Two are moderately reliable, although the rankings begin to 
diverge slightly for these items. Eleven items are classified as hiving low reliability 
based on case discrepancy rate. These range from net home equity (about 1 1 percent) 
to the least reliable, on this scale, cash/savings/checking (about 46 percent). Four 
items have low reliability using both rates: dependent student assets and income, 
nontaxable income and family size. 

The rates differ b^^cause of the nature of the formula. Clearly, the more 
directly a change in th& data element produces a change in award, the closer the rates 
and ranking. Many elements, such as cash/savings/checking, dependent student's 
assets and income, are subject to protections and taxed at a low rate; thus, the 
differences between the rates and rankings are wider. 

Several observations should be made concerning this data and case discrepancy 
rate. First, the data are recipient data. We are consciously generalizing from 
recipient to applicant behavior. We believe this is sound because no data suggest that 
applicant and recipient misreporting behavior is different. In fact, the Title IV Quality 
Control Project, which examined error in the Campus-Based and Guaranteed Student 
Loan Program and included many Pell applicant non-recipients, reports error patterns 
generally similar to the Pell QC Project. Second, the discrepancy rate represents the 
rate at which the true or validated data values differed from reported values by more 
than plus or minus $2, the range specified by ED in the Pell QC Project. Third, the 
rate includes zero and non-zero reported values. Since the discrepancy reflects both 
values, the rates are themselves an artifact of the occurrence of this characteristic in 
the general population. For instance, if a small percentage of the population has 
business/ farm equity, the error rate inherently will be lower than for AGI or 
nontaxable income. This occurs because, among other reasons, nonbusiness/farm 
owner applicants implicitly report zero values. Thus, there is a lower probability of 
error in the general population. 

Verifiability 

The final dimension on which the data elements were evaluated is verifiability. 
Verifiability is a corollary of reliability and a logical and important policy concern in 
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any systematic evaluation of data elements* We focused our analysis by addressing the 
question: To what degree can the data element be corroborated through an alternate 
source of documentation? 



Our assessment of the degree to which data elements can be verified is 
essentially qualitative. The assessment draws upon a rich body of qualitative data 
developed through the fall 19S2 study of Pell validation compliance and particularly 
the "best value" selection software for the Pell and Title IV QC projects. The research 
that produced the best value selection software and documentation represents one of 
the most thorough reviews of corroborative documentation for data items used in the 
Pell formula. These data informed our assessment of individual data items. 



Each item was analyzed from five perspectives: 



• Is a reliable corroborative data source available for each item? 

In answering this question, we essentially asked whether a document 
existed with which the data item could be verified and which was produced 
by an "official," neutral third party. We also considered whether the data 
from this document treated the time period and used the same general 
definition for the data item as the formula. 

• Is the document readily available? 

In assessing the data element from this perspective we considered whether 
most families have and maintain this documentation. Conversely, if 
families must request the document often, we considered whether it was 
easily obtained. The experience of our staffs fieldwork with financial aid 
staff was used extensively in this analysis. 

• Is the document provided quickly? 

Here we evaluated whether the agencies (companies, etc.) from which a 
family would have to request a document(s) provide these in a timely 
manner. We also called upon staff experience with financial aid officers, 
and their experiences, to conduct this evaluation. 

• Is the data retrospective? 

We assessed whether the data used in the formula was retrospective (e.g., 
prior or base year AGI), which can be verified more easily. 

• Can errors of omission as well as commission be detected? 

Lastly, we evaluated the degree to which failing to report as well as under 
or overreporting could be identified. 
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These five questions focused our assessment of the individual data elements* 
Once each data item was evaluated, we ordinally ranked the items. Ranking took 
place in several stages. Each of the questions discussed above was weighted equally, 
except omission/commission, which doubled the elements' score if both errors could be 
detected. Each of the data elements received one of five assessments (yes, reliable 
approximation, uncertain, often no, no). Each of these was weighted on a symmetrical 
scale from -»-2 for yes to -2 for no. The elements were then classified into high, 
medium, and low error of validation as follows: 

• High ~ Three or more yeses and both omission/commission (a score of 



greater than 10) 

• Moderate — Between two yeses and both omission/commission, and three 
yeses (a score of between 6 and 10) 

• Low — Fewer than three yeses (a score of less than 6) 

Table 7 presents the results of the evaluation. Four elements are classified as 
having high verifiability; four as moderate. Nontaxable income is ranked by the 
composite of its subcomponents, which are examples of the types of income that are 
included in this data element. 

The verifiability for the remaining data elements is classified as low. Generally 
these are asset items (home, business/farm, and investment equity and dependent 
student assets), demographic items (family size, number in college and student's 
marital status) and expenses (medical/dental). Assets receive low scores because of 
the difficulty of establishing value, the relative difficulty in discovering errors of 
omission and the potential difficulty of rapidly providing up-to-date documentation. 
Two of the demographic items, family size and number in college, are prospective and 
therefore virtually unverifiable, although number of exemptions can be used as a 
reasonable approximation, acknowledging the limitation of such comparisons. 
Student's marital status is difficult to verify because almost nothing short of a 
marriage license can conclusively prove the student's status. Therefore, no other 
documentation can be considered reliable (e.g., tax forms). Medical/dental expenses 
may be difficult to verify simply because of the potential volume and diversity of 
documentation and payment forms. 
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TABLE 7 





VERIFIABILITY OF DATA ELEMENTS USED IN THE PELL FORMULAE 




CUssi- 
Ucation 


Item/Sub-llein 


Reliable 
Source 


Readily 
Available 


Provided 

Ouk^^lv 


Retrospective 


Omission/ 
Commission 




Adjusted Gross IfMrome 


Yes 


Yes 


Ye$ 


Yes 


O/C 


CD 


Employment Expense Offset 


Yes 


Yes 


Yes 


Yes 


O/C 




U S TAttes Paid 

■ • Sill 


Yes 


Yes 


Yes 


Yes 






Veteran's Education Benefits 


Yes 


Yes 


Yes 


No 


O/C 




Social Security Education Benefits 


Yes 


Yes 


Yes 


Yes 


c 




Dependent Student's Income 


Yes 


Yes 


Yes 


Yes 


c 




Cash/Sa V ings/Checklng 


Reliable 
Approximation 


Yes 


Yes 


Yes 


c 




Nontaxable Income 












MODERATE 


Social Security Benefits 
AFDC 

Child Support 
Welfare 

iJnginpliivingnt 

Railroad Retirement Benefits 
Dlsat>lllty income 
Veteran's Benefits 
Interest from Tax Free Bonds 


Yes 
Yes 
Often no 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 


Yes 
Uncertain 
Often No 
Uncertain 

Yes 

Yes 

Yes 

Yes 

Yes 


Uncertain 
Often No 
Uncertain 
Uncertain 
Linear tsin 

Mill! 

Yes 
Yes 
Yes 
Uncertain 


Yes 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 


c 
c 
c 
c 

O/C 

c 
c 
c 

n 




Elementary/Secondary Tuition 


Yes 


Yes 


Uncertain 


Yes 


c 




Dependent Student's Net Assets 


Reliable 
Approximation 


Yes 


Uncertain 


Yes 


c 




Net Home Equity 


Reliable 
Approximation 


No/Uncertain 


Often No 


Yes 


c 




Net Investment Equity 


Reliable 
Approximation 


Uncertain 


Often No 


Yes 


c 




N^t Rijsln#sft/Parm Eouitv 


Reliable 
Approximation 


Uncertain 


Often No 


Yes 


c 




Unusual Medical/Dental Expenses 


No 


No 


No 


Yes 


c 




Student's Marital Status 


No 


No 


No 


Yes 






Family Size Offset* 


No 


No 


No 


No 


O/C 




Number in College* 


No 


No 


No 


No 


c 



•Prospective items; evaluation in future years. 
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Joint Coraaderation of the Measures 

The analyses presented in the prior sections of the chapter provide the data with 
which to evaluate the impact of individual data elements across several measures. 
However, we have assumed that decisions concerning the elimination of data elements 
cannot be made on the basis of any single measure or dimension. Consequently, our 
approach has cissumed that it is necessary to jointly consider the impact of data 
elements across these five dimensions. Such an integration, however, confronts 
fundamental policy questions, for instance concerning the relative importance of each 
of the measures, which only F J policymakers can address. Fully acknowledging this 
fact and the fact that policymakers may differ concerning the relative importance, 
our approach to integrating the results of the discrete analyses is two-fold. First, we 
present a framework that allows ED policymakers to make individual judgments about 
the impact of data elements. Second, we provide two examples of how such judgments 
can be made within this framework. 

There are numerous ways to classify the data elements across the five measures. 
For brevity's sake, we have chosen only two as examples. Ta>ie 8 presents the first 
such example. In this first example we assume that each of the measures has equal 
importance and therefore high budgetary impact is equally as important as high 
reliability and verifiability. In addition, for simplicity's sake, we have grouped the 
data elements by assigning values to high, moderate, and low scores (2, 1 and 0, 
respectively) on each of the measures and divided the elemer ; into three 
approximately equal high, moderate, and low classes. Those elements classified as 
high on average have the highest impact across the five measures; conversely, those 
classified as low have the lowest. We have assumed that one would approach the 
elimination of data elements by beginning with data elements in the low joint 
classification and considering whether the elimination of each data elemei t requires 
too substantial a tradeoff. 

One of the seven data elements in the low joint classification (medical/dental 
expenses) received low classification across all of the measures. Dependent student's 
income had moderate budgetary impact and verifiability. Dependent student's net 
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TABLES 



EXAMPLE OF X>1NT RANKING OF THE DATA ELEMENTS 
ASSIGNING EQUAL WEIGHTS TO EACH MEASURE 







Budgetary 
bniiact 


Dittributioral 
Impact 


Sensitivity 


Reliability 


VerifiabiUty 


Classi- 
fication 




(Veiiilua) 
($ Million) 


(Weicht^l) 


(«elcht:=l) 
(«Ain award) 


(Wei|ht::l) 
(% w/error) 


(Weight::!) 
(Rank) 




Adjusted Gross IiKome 


High 
(1,708) 


High 
(42) 


High 
(1,507) 


Low 
(16) 


High 
(1) 




oociai occuriiy coucaiion 
Benefits 


Mi ah 

nign 
(276) 


(8) 


Hiffh 
nign 

(19*) 


(5) 


Moderate 
(5) 


HIGH 


U.S. Taxes Paid 


Moderate 
(-155) 


High 
(15) 


High 
(-85) 


Low 
(14) 


High 
(31 




Family Size Offset 


High 
(-1,*55) 


High 
(51) 


High 
(-65) 


Low 
(22) 


Low 
(16) 




Employment Expense Offset 


Moderate 
(-U0) 


Moderate 
(10) 


Moderate 
(-27) 


Low 
(18) 


High 
(3 




Net Home Equity 


Moderate 
(117) 


Moderate 
(7) 


High 
(100) 


Low 
(10.7) 


Low 
(11) 




Number in College 


Moderate 
(-100) 


Moderate 
(8) 


Moderate 
(-17) 


Low 
(14) 


Low 
(17) 


Ui 

1— 




Modfiratfi 

(90) 


Low 
(5) 


Moderate 
(14) 


Low 
(31) 


Moderate 
(8) 


MODEF 


Veteran's Education 
Benefits 


Low 
(13) 


Low 
(1) 


Moderate 
(12) 


High 
(0 


Moderate 
(4) 




Elementary and Secondary 
Tuition 


Low 
(-13) 


Low 
(1) 


Moderate 
(-11) 


High 


Moderate 
(9) 




Dependent Student's 
Net Assets 


Low 
(35) 


Low 
(3) 


Moderate 
(10) 


Low 
(3J) 


Low 
(10) 




Net Investment Equity 


Low 
(10) 


Low 
(•) 


Low 
(8) 


High 
(2^ 


Low 
(12) 




Dependent Student's Income 


Moderate 
(71) 


Low 
(3) 


Low 
(6) 


Low 
(37) 


Moderate 

(6) 




i>iei Dusincssf rarm [.(|uiiy 


Low 

(8) 


Low 

(•) 


Low 

(•) 


Hiffh 

(1) 


1 ntu 

(13) 


3 
O 
-J 


Student's Marital Status 


Low 
(5) 


Low 
(1) 


Low 
(2) 


Moderate 
(10) 


Low 
(15) 




Cash/Savings/Checking 


Low 
(8) 


Low 
(1) 


Low 
(1) 


Low 
(46) 


Moderate 
(7) 




Unusual Medical /Dental 
Expenses 


Low 
(-2) 


Low 
(•) 


Low 
(•) 


Low 
(23) 


Low 
(14) 


ERIC " * percent. 
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assets had moderate sensitivity and cash/savings/checking had moderate veriflability. 
Student's marital status had moderate reliability. Net investment and business/farm 
equity both were classified as having high reliability. Thus, all seven could be 
reasonably considered for elimination under this classification. 

For the data items in the moderate joint classification, consideration of 
eliminating them from the Pell formulae becomes a process of dealing with the 
tradeoffs among measures. Veteran's benefits and elementary and secondary tuition 
have identical impact across all measures, having low budgetary and distributional 
impact, moderate sensitivity and veriflability and high reliability. Number in college 
has moderate budgetary and distributional impact, and sensitivity and low reliability 
and veriflability. Nontaxable income has moderate budgetary impact, sensitivity and 
veriflability and low distributional impact and reliability. Net home equity has 
moderate budgetary and distributional impact, high sensitivity, but low reliability and 
veriflability. 

The remaining items (AGI, social security education benefits, U.S. taxes, family 
size, and employment expense offset) have the highest impact across the five 
measures. Within this framework, these items could not be eliminated without a major 
impact on the program. 

The above discussion is an example of how a policymaker might integrate these 
data given the weighting and classification. Alternative weights could be assigned to 
each measure, suggesting that some of the measures, such as budgetary impact, are 
more important than others. In the second example of integrating the scores from the 
individual measures, we have selected budgetary impact as most important, 
distributional impact and sensitivity as more important and reliability and veriflability 
as less important. Thus, we have assigned a weight of three to budgetary impact, a 
weight of two to distributional impact and sensitivity and a weight of one to reliability 
and veriflability. Effectively this means that budgetary impact has three times the 
weight of veriflability, implying greater importance. 

Table 9 presents an example of how this differential weighting affects the 
classification of data elements. One will notice that the classification of the data 
elements was not affected by differential weighting. The differential weights may. 
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however, affect the decision to eliminate an individual data element within a 
classification. For example, number in college received moderate classifications on 
budgetary and distributional impact and sensitivity and low classifications on 
reliability. Using equal weights, one might choose to eliminate this item. Assigning 
the differential weij^ ts, however, may lead one to reconsider the elimination of the 
item, since the measures on which the data item received moderate classifications 
would be assumed to be more important. Greater changes in classification would 
occur as the difference between the highest and lowest weights increase. This 
example suggests, however, that classification is relatively unaffected by small 
changes in weights. 

CONCLUSION 

This chapter has presented the results of a systematic analysis of the impact of 
individual data elements designed to provide ED policymakers with the data needed to 
make informed decisions concerning potential data element reduction options. Each 
section has presented the results of analyses on an individual measure. The final 
section presents a framework that policymakers will find useful for integrating these 
individual analyses, which would be necessary to simultaneously consider the measures. 
This section also provides two xamples of how the framework could be used, 
employing different weighting schemes. The result is a powerful analytic tool for ED 
policymakers to develop and evaluate potential data element reduction proposals. 

A word of caution should be offered concerning the interpretation of the joint 
consideration of measures. The analysis assessed the impact of eliminating individual 
data elements. These results cannot inform policymakers about the cumulative 
effects of eliminating groups of data elements. The following chapter provides an 
evaluation of the effects of one such alternative, a five element formula. 
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TABLE 9 



EXAMPLE OF JOINT RANKING OF THE DATA ELEMENTS 
ASSIGNING DIFFERENTIAL WEIGHTS TO EACH MEASURE 







Budgetary 
ImiMCt 


Distributional 
Impact 


Sensitivity 


ReUability 


rifiability 


Classi- 
fication 




(Weight^) 
($ Million) 


(Welght»2) 


'Weisht=2) 
(%t^in award) 


(Weiftht:=l) 
(% w/crror) 


^ Weights 1) 
(Rank) 




Adjusted Gross Income 


High 
(l,70S) 


High 
(42) 


High 
(1,307) 


Low 
(16) 


(0 




Social Security Education 
Benefits 


High 
(276) 


Moderate 
(S) 


High 
(19^) 


Moderate 
(5) 


Moderate 
(5) 


HIGH 


Family Size Offset 


High 
(-l,*55) 


High 
(51) 


High 
(-65) 


Lov 
(22) 


Low 

\16) 




U.S. Taxes Paid 


Moderate 
(-155) 


High 
(15) 


High 
(-S5) 


Low 
(14) 






Employment Expense Offset 


Moderate 
(-80) 


Moderate 
(10) 


Moderate 
(.27) 


Low 
(18) 


High 
(2) 




Net Home Equity 


Moderate 
(117) 


Moderate 
(7) 


High 
(100) 


Low 
(10.7) 


Low 
(II) 




Number in College 


Moderate 

\"I WW/ 


Moderate 


Moderate 


Low 


Low 
\17) 


UJ 

•- 


Nontaxable Income 


Moderate 
(90) 


Low 
(5) 


Moderate; 


Low 
(31) 


Moderate 
(S) 


s 

UJ 

o 


Veteran's Education 
Benefits 


Low 
(13) 


Low 
(1) 


Moderate 
(12) 


Hifth 
(1) 


Moderate 

M 




Elementary and Secondary 
Tuition 


Low 
(-13) 


Low 

(1) 


Moderate 
(-!l) 


High 
(2) 


Moderate 
(9) 




Dependent Student's 
Net Assets 


Low 
(35) 


Low 
(3) 


Moderate 
(10) 


Low 
(35) 


Low 
(10) 




Net Investment Equity 


Low 
(10) 


Low 
(•) 


Low 
(S) 


(2) 


Low 
(12) 




Oepcj ^f-nt Student's Income 


Moderate 
(71) 


Low 
(3) 


Low 
(6) 


Low 
(37) 


Moderate 
(6) 




Net Business/Farm bquity 


Low 
(S) 


Low 

(*) 


Low 
(•) 


Hij^h 
(1) 


Low 
(13) 


3 

o 


Student Marital Status 


Low 
(5) 


Low 
(1) 


Low 
(2) 


Moderate 
(10) 


Low 
(15) 




Cash/Savings/Checking 


Low 
(S) 


Low 
(1) 


Low 
(1) 


Low 


Moderate 
(7) 




Unusual Medical/Dental 
Ex|>enses 


Low 
(-2) 


Low 
(•) 


Lo^ 
(•) 


Low 
(23) 


Low 
(14) 



•Less than I percent. 

erIc 



41 BESl COPY AVAILABLE 



3 

ANALYSIS OF A FIVE DATA ELEMENT FORMULA 

The prior chapter presented an analysis of the impact of eliminating individual 
data from the Pell Grant eligibility and award formulae. This ciiapter presents an 
analysis of one proposal to reduce the number of application data elements that are 
used to calculate Pell awards to five. As described in the Introduction, this analysis is 
better able to isolate the effects of eliminating data elements by controlling for 
reporting error. We have controlled for error by conducting analyses of a second 
simulation using a data base from which error has beer eliminated by imputing error 
patterns found in the Stage III Pell QC data base to the applicant data base. This 
imputation procedure is presented in detail in Technical Appendix B. 

DESCRIPTION OF THE SIMULATIONS 

The two simulations conducted by the Division of Policy and Program 
Development in this analysis— the standard and the error free simulations—are 
identical with the exception of the data base used. Each simulation consists of three 
model runs, the first of which develops a baseline measure using the full formula in the 
1982-S3 program year. Both simulations then eliminate all but five data elements. 
(Dependency status remains in the formula and is not treated explicitly as a data 
element.) These are: 

• Adjusted gross income 

• Federal taxes paid 

• Nontaxable income 

• Number in household 

• Number in postsecondary education. 

42 
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Eliminated from the formula are the following income, asset, and expense data (not 
necessarily data elements): 

• Student/spouse income 

• Net home assets 

• Net farm and business assets 

• Cash, checking, savings 

• Net interest assets 

• Dependent student's assets 

• Offset for unreimbursed elementary and secondary tuition 

• Offset for high medical and dental expenses 

• Employment expense offset 

• Social Security Education Benefits 

• Veteran's Education Benefits, 

The second run, which uses a five element formula, is used to estimate the 
adjustments to formula "taxation" rates required to maintain budget neutrality. 
Budget neutrality was one parameter for analysis specified by ED, Tax rate 
adjustments are necessary because reducing the formula to five :lements causes the 
budget to increase by approximately $130 million. The tables in Appendix D (Tables 
D-1 and D-2) display this increase for both data bases when tax rates are not adjusted. 

The third run has taxation rates adjusted to maintain budget neutrality (Table 
10) . The analysis primarily focuses on the first (full formula) and third (five element 
with taxation rate adjustments) runs. This analysis explicitly identifies the effects of 
data element reduction using a standard and "error free" data base while maintaining 
budget neutrality. 

The analysis of be h simulations focuses on four policy questions that will assist 
OSFA policymakers in evaluating data element reduction as a potential corrective 



More information concerning the effects of taxation rates can be obtained by 
consulting The Pell Grant Formula, 1982-83, U.S. Department of Education, Office of 
Student Financial Assistance. 
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TABLE 10 



TAXATION RATES FOR PARENTS* DISCRETIONARY INCOME 
USING BOTH THE STANDARD AND ERROR FREE SIMULATIONS 



Discretionary Income 



Standard Taxation Rate 



AHiusted Taxation Rate 



0 to $ 5,000 

3,001 to ^10,000 

;io,ooi to $13,000 

^13,001 and above 



11% of discretionary income 
$ 330 + 13% of amount over $3,000 
$1,200 + 18%of amount over ^10,000 
$2,100 + 23%of amount over $13,000 



13% of discretio nary income 
S630 + 13% of amount over $3,000 
5UOO + 27% of amount cer $10,000 
$2730 + 30% of amount over $13,000 



I 

ON 
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action* The results from these simulations are compared to assess the effects of data 
element reduction under different simulations. These four questions are: 

• How do eligibility and awards change when data elements are reduced to 
five? 

• What are the characteristics of those whc gain and lose from the program 
changes? 

• What are the characteristics of newly-eligible recipients? 

• What are the characteristics of students who lose eligibility? 

These simulations are presented below. 

Standard Simulation Using Reported Data 

DPPD staff conducted a simulation of the effects of reducing the number of data 
elements to five using the standard data base (reported data) holding budget constant. 
The results, organized around the four questions, are as follows: 

How do eligibility and awards change? 

Generally, analysis of the standard simulation indicates that at the highest level 
of aggregation, reducing the number of data elements results in very small changes in 
the number of recipients, listribution of recipients by income strata, and mean award. 
More specifically, the findings indicate that: 

• Although the budget remains approximately constant, the adjustment of 
taxation rates to maintain a constant budget produces slight increases in 
the number of recipients by over 50,000 (2 percent) , when the number of 
data elements is reduced to five (Table 11). 

• The proportion of program costs awarded to higher income recipients 
declines slightly. The mean award decreases to $960 from about $980. 

• About 82 percent of those applicants ineligible under the full formula 
remain so under the reduced formula (Table 12). 

• The majority of recipients in most award strata receive the same award 
(the center diagonal of Table 12). 
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TABLE 11 

COMPARISON OF NUMBER OF REaPKNTS AND PROGRAM COSTS 
FOR THE 1982-83 PELL PROGRAM YEAR UNDER THE FULL 
AND FIVE DATA ELEMENT FORMULAE !1SING STANDARD 

REPORTED DATA 



Full Formula 



Five Element FormuU 
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TABLE 12 



COMPARISON OP PELL AWARDS FOR THE 1912-13 PROGRAM YEAR 
UNDER FULL AND FIVE DATA ELEMENT FORMULAE 
USING STANDARD REPORTED DATA 



P«iUP«niMili 



Five Ekmcfil I 



t«t«i« tMM.9 MIALt lilALi 
f4«« ffU IUM> l«Mfy lAViyB 1MS«« 

••>« >.fy i.fi a.M >.«y 



MI-VM MI'tM 1*1 

UIM.9 UIM.9 l«IM.t t«tM.« t«lM.« l«IM.« t«IM.« ltlM.t UlMB 

Mt«M IM«4« M9IM ISSS» IfAlM IMM mvsa 

l.at «.IB B.«f B.«l S.«B B.M 
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Not* IteMrimllnf TMs TaMn The above labk inatcaltB the ptf ccnUgc ol reci|MCfils who receive awarih under the 
reifciced lormiiU that are the ume, greater or Wu than those received under the lull lormula* The center diagoii4i lines 
Irom top lelt to bottom right hIgHllghl iHe percentage ol reclplcnto within each award range (e.g.» $101 - iOOl wliose 
award was unchanged under the roduced lormula. For eaample» about 4)% received an award ol between $MI and $400 
under both lormulae. About 24% received less and about 9% received more under the reduced lormula. Two percent ol 
those who received an award between $101 and $400 under the lull lormula received between $401 and $700 under the 
reduced lormula. The areas in the upper right and lower fiell set oil 'by single diagonal lines indicate the greatest 
changes In awards. 

Technical NoSot The touls In this simulation do not e<|Uil the actual n umber ol applicants because a participation rate 
(or no slww rale) has been applied to all applkantB by adiusting the sampling weight ol each applicant. The res*j|t is a 
reduction in the overall number ol applicants to more accurately rellect the number tliat become recipients. The 
number ol recipients is accurate. 
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• Of those receiving the maximum awards— the neediest students— 91 percent 
continue to receive the maximum award and 99 percent receive awards 
within $200 of the maximum. Of those recipients receiving an award of 
more than $1*00, 92 percent receive more than $1*00 under the f . e 
element formula* 

• Of those receiving the lowest awards (not greater than $*00) *9 percent 
continue to receive an award not greater than $400. 



What Are the Characteristics of Those Who Gain and Lose? 



In general, the following patterns describe those applicants who have their 
awards increased (gainers) or decreased (losers): 



• Most gainers are clustered in the middle of the award range; students 
receiving smailer awards (below $500) are more likely to lose under the 
reduced formula than those receiving the higher awards. The neediest 
students, those receiving the highest awards, are least likely to lose 
significant amounts. Relatively few applicants gain or lose extremely 
large amounts (upper right and lower left sections of Table 12). 

• Of those whose awards increase, 66 percent increase by less than $600, 25 
percent increase by $600 - $1,200 and 9 percent by more than $1,200. 

© Those gaining less than $600 had a mean AGI of $12,700 and mean net 
assets of almost $*0,000; those gaining $600 - $1,200 had a mean AGI of 
$12,500 and mean net assets of $5*,000; and those gaining over $1,200 had 
a mean AGI of $9,000 and mean net assets of $92,000. 

• Of those recipients whose awards decrease, almost 98 percent decrease by 
less than $600; about 2 percent decrease by $600 - $1,200 and less than .1 
percent by more than $1,200. 



The following data summarize the percentage of Pell Grant recipients who gain, 
lose, and stay the same (within $50) by specific demographic and financial 
characteristics under the five data element formula when compared with the current 
formula. 
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Characteristics 



• All Applicants 

• Dep-?ndent Students with 
Family Size * and Under 

• Dependent Students with 
Family Size 5 and Over 



Percentage Who 
Receive a 
Smaller 
Award 

16 
20 



19 



.*0- 



5) 



Percentage Who 
Receive the 
Same Award 

73 
63 

66 



Percentage Who 
Receive a 
Larger 
Award 

11 
17 

15 



Characteristics 



• Independent Students 

• Families with 1 in Post- 
secondary Education 

• Families with more than 
1 in Postsecondary 
Education 

• Dependent Students with 
Net Home Value under 
$10,000 

• Dependent Students with 
Net Home Value over 
$10,000 

• Dependent Students with 
Family Investments under 
$10,000 

• Dependent Students with 
Family Investments Over 
$10,000 

• Dependent Students with 
Family Business/Farm Value 
Under $10,000 

• Dependent Students with 
Family Business/Farm Value 
Over $10,000 

• Families with No Nontax- 
able Income 

• Families with Some Nontax- 
able Income 

f Dependent Students with 
No Extraordinary Family 
Medical/Dental Expenses 

• Dependent Students with 
Any Extraordinary Medi- 
cal/Dental Expenses 

• Sxod-nt Enrolled FuU-TIVne 

• Student Enrolled Less Than 
Full-Tim e 



Percentage Who 
Receive a 
Smaller 
Award 

11 

14 



21 



26 



16 



21 
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20 
12 

18 

13 
17 

21 

16 
13 



Percentage Who 
Receive the 
Same Award 
(i$3Q) 

86 

77 



63 

65 

64 

64 

69 

65 

59 

71 
76 
71 

61 

72 
82 



Percentage Who 
Receive a 
Larger 
Award 

3 

9 



16 



20 

15 

23 

15 

29 

11 
11 
12 

18 

12 
5 
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From these data we can conclude that: 



Almost three-quarters of all applicants would receive the same award 
under the reduced formula as under the full formula; one-quarter would 
rece'v?* . 'Afferent award. 

The vase majority (86 percent) of independent students are unaffected by 
data element reduction. 

Students who fare better than average under dat^ element reduction as 
expected are those from families with higher home equity, larger 
investments, businesses, or farms. These wealth elements are not 
considered in the reduced data formula. 

Students enrolled less than full-time, reflecting a high proportion of 
independent students, are less likely to be affected by data element 
reduction than are full-time students. 



What Are the Characteristics of Newly Eligible Recipients? 



An estimated 200,000 applicants who are ineligible under the full formula 
would become eligible under the reduced formula. 

Of these newly eligible recipients, half would receive an award of less than 
^600, one-third would receive between $600 and $1,200, and one-sixth over 
$1,200. 

Those newly eligible recipients gaining less than $600 had a mean AG! of 
$20,000 and mean net assets of $57,000; those gaining awards of between 
$600 and $1,200 had a mean AG! of $15,000 and mean net assets of 
$61,000; and those gaining awards in excess of $1,200 had a mean AG! of 
$9,000 and mean net assets of $97,000. 



What Are The Characteristics Of Students Who Lose Eligibility? 



Slightly less than 150,000 students who received awards under the full 
formula become ineligible under the reduced formula. 

Of the 360,000 who received an award of less than $^01 under the full 
formula, 33 percent became ineligible. Almost no one among the 1.2 
million students who received in excess of $1,000 under the full formula 
became ineligible under the reduced formula. 

Those students who lost an award of less than $600 had a mean AG! of 
$24,000 and mean net assets of $1^,000; those who lost an award between 
$600 and $1,200 had a mean AG! of $22,000 and mean net assets of $7,000 
and those who lost an award in excess of $1,200 had a mean AG! of $12,000 
and mean net assets of $8,000. 
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•^rror Free" Simulatiun 

The Division of Policy and Program Development conducted a second simulation 
of the impact of a reduced formula using a data base to which "best values" were 
imputed. This imputation effectively removed reporting error from the data base and 
permitted a more accurate measurement of the effects of data element reduction as 
distinct from the elimination of error. This simulation focuses on the same four 
questions as the standard simulation. 

How Do Eligibility and Awards Change? 

This simulation also indicates that at the highest level of aggregation, reducing 
t:.5 number of data elements, using an error free data base, results in even smaller 
changes in the number of recipients, distribution of recipients by income strata, and no 
change in mean award. More specifically, the findings indicate: 

• Maintaining approximate budget level results in a negligible increase in 
recipients, about 11,000 or less than .5 percent (Table 13). 

• The proportion of program costs awarded to low income recipients 
increases slightly and the proportion awarded to high income recipients 
decreases. 

• The mean award of $940 is unchanged. 

• Over 86 percent of those 1.2 million ineligible applicants under the full 
formula remain ineligible under the reduced formula (Table 1*). 

• The majority of recipients in most award strata receive the same award 
(the center diagonal of Table 14). 

• Of the 250,000 students receiving maximum awards«-the neediest 
students— 90 percent continue to receive the maximum and 98 percent 
receive within $200 of the maximum. Of the *80,000 students receiving 
more than $1,*00, 92 percent continue to receive in excess of $1,*00. 

• Just under 50 percent of the 350,000 students who leceived $*00 or less 
under the full formula continue to receive an award of $*00 or less. 
Thirty-six percent of the students who originally received $*00 or less 
become ineligible. 



What Are the Characteristics Of Those Who Gain and Lose? 



In general, the following patterns describe those applicants whose awards 
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increased or decreased. 



• Table 14 indicates that most students whose awards increased are clustered 
in the middle of the award range; those receiving a smaller award ($500) 
under the full formula are most likely to receive a smaller award under the 
reduced formula. The neediest students, those receiving the highest 
awards, are least likely to have their awards decrease signficlantly. 
Relatively few applicants gain or lose extremely large amounts (the upper 
right and lower left of Table 14). 

• Of those students whose awards increased, 72 percent increased by less 
than $600, 22 percent increased by $600 - $1,200 and 6 percent by more 
than $1,200. 

• Those gaining less than $600 had a mean AGI of $13,000 and mean net 
assets of $34,000; those gaining between $600 and $1,200 had a mean AGI 
of $14,000 and mean net assets of $37,000 and those gaining over $1,200 
had a mean AGI of $10,000 and mean net assets of $90,000. 

• Of those whose awards decreased, slightly less than 98 percent decreased 
less than $600, about 2 percent decreased between $600 and $1,200 and less 
than .1 percent decreased more than $i,200. 

• Those students losing less than $600 had a mean AGI of almost $17,000 and 
mean net assets of $12,000: those losing between $600 and $1,200 had a 
mean AGI of $14,000 and mean net assets of $6,000; those losing more than 
$1,200 had a mean AGI of about $12,000 and mean net assets of $3,000. 



The following data summarize the percentage of Pell Grant recipients who gain, 
lose, and stay the same (within $50) by specific demographic and financial 
characteristics under the five data element formula when compared with the current 
formula using error free data in both runs. 



Characteristics 



Percentage Who 
Receive a 
Smaller 
Award 



Percentage Who 
Receive the 
Same Award 
(t$50) 



Percentage Who 
Receive a 
Larger 
Award 



« All Applicants 



18 



72 



10 



• Dependent Students with 22 64 15 
Family Size 4 and Under 

• Dependent Students with 20 66 14 
Family Size 5 and Over 

• Independent Students 13 85 3 
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Families with 1 in Post- 17 74 

secondary Education 



Characteristics 



• Families with more than 
1 in Postsecondary Edu- 
cation 

• Dependent Student with 
Net Home Value under 
$10,000 

« Dependent Student with 
Met Home Value over 
$10,000 

• Dependent Students with 
Family Investments under 
$10,000 

• Dependent Students with 
Family Investments Over 
$10,000 

• Dependent Students with 
Family Business/Farm Value 
Under $10,000 

• Dependent Students with 
Family Business/Farm Value 
Over $10,000 

• Families with No Nontax- 
able Income 

• Families with Some Non- 
taxable Income 

• Dependent Students with 
No Extraordinary Family 
Medical/Denial Expenses 

• Dependent Students with 
Any Extraordinary Medical/ 
Dental Expenses 

• Student u-irolled Full-Time 

• Student Enrolled Less Than 
Full-Time 



Percentage Who 
Receive a 
Smaller 
Award 

22 



26 

IS 

22 

12 

21 

16 

19 
18 
18 

22 

18 
15 



Percentage Who 
Receive the 
Same Award 
(t$50) 

63 



64 
65 

64 

69 

65 

60 

72 
71 
70 

62 

71 
79 



Percentage Who 
Receive a 
Larger 
Award 

lit 



10 



17 



14 



19 



14 



24 



11 
12 

15 

11 
6 
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From this table we can conclude that: 



• Almost three-quarters of all applicants would receive the same award 
under the reduced formula as under the full formula. 

• The vast majority (S5 percent) of independent students are unaffected by 
data element reduction. 

• Students who fare better than average under data element reduction as 
expected are those from families with higher home equity, larger 
investments, businesses, or farms. These wealth elements are not 
considered in the reduced data element formula. 

• Students enrolled less than fulMime, reflecting a high proportion of 
independent students, are less likely to be affected by data element 
reduction than are full-time students. 



What Are the Characteristics Of Newly Eligible Recipients? 



• An estimated 162,000 applicants who were ineligible under the full formula 
would become eligible under the reduced formula. 

• Approximately 46 percent of these newly eligible recipients would receive 
$600 or less, 36 percent between $601 and $1,200 and 18 percent more than 
$1,200. 

• Newly eligible receipients who would receive an award of less than $600 
had a mean AG! of $19,000 and mean net assets of $58,000; those who 
would receive between $600 and $1,200 had a mean AG! of 15,00^ and 
mean net assets of $67,000; those who would receive over $1,200 had a 
mean AG! of $10,000 and mean net assets of $95,000. 



What Are The Characteristics of Students Who Lose Eligibility? 

• An estimated 151,000 students who were eligible under the full formula 
would lose eligibility under the reduced formula. 

• Of those 151,300 who lose eligibility, 92 percent lose awards of less than 
$600, slightly less than 8 percent lose awards between $600 and $1,200 and 
less than 1 percent lose awards of over $1,200. 

• Virtually all of the neediest students, those receiving maximum awards, 
remain eligible. 

• Those students who lost less than $600 had a mean AGI of $22,000 and 
mean net assets of $16,000. Those losing between $600 and $1,200 had a 
mean AGI of $19,000 and mean net assets of $9,000. Those losing in excess 
of $1,200 had a mean AGI of $17,000 and mean net assets of $6,000. 



nNDINGS 



The simulations presented in the prior sections of this chapter result in several 
outcomes. The first of these is a more thorough understanding of the budgetary and 
distributional effects of reducing the number of data elements that are used to 
calculate Pell eligibility and awards to five. 



The second outcome is the development of a thorough description of the 
comparative effects of data element reduction controlling for error. These compara- 
tive budgetary and distributional effects can be expressed on several levels. The data 
indicate the following general findings: 



• The greatest differences in the impact of data element reduction using the 
two data bases are evident at the aggregate level including program costs 
and number of recipients. Results are fairly similar across many dimen- 
sions on a more detailed level. 

• Use of an error free data base in simulating the effects of data element 
reduction dampens the increase in recipients ^nd slightly increases the 
budgetary impact. 

More specifically, a comparison of the two simulations indicates the following: 



• The error free five element formula with tax rate adjustments results in a 
level of recipients that is 1^2,000 students less than the standard 
simulation. 

• The baseline budget for the error free simulation is $215 million dollars 
less than the baseline budget for the standard simulation ($2.^8 billion). 

• The net increase in program costs for an error free reduced formula 
without tax rate adjustments ($U9 million) is slightly larger than for the 
standard simulation of a reduced formula without tax rate adjustments 
($130 million, see Appendix D, Tables D-1 and D-2). 

• Program costs for an error free simulation of a five element formula 
without tax rate adjustment ar^ equal to the baseline program costs of 
about $2.^8 billion, suggesting that when er * t is eliminated, no increase 
in taxation rates is necessary. (See Appendix D, Table D-2.) 

• The average award in the error free simulations is unchanged under the 
reduced formula, while the average award drops slightly in the staridard 
simulation. 
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On a more detailed level the simulations produce different results on the 
following dimensiond: 

• More students receiving low awards ($500 or less) continue to receive such 
awards under the standard simulat'ons. 

• More students receive lower increases ($600 or less) 'jnder the error free 
simulation. 



Differences on other dimensions between the simulations are minimal (e.g., 
within 2 to 3 percentage points) and mixed. 

CONCLUSION 

This chapter has presented the results of two simulations of reducing the number 
of data elements in the Pell need analysis formula to five. These simulations have 
advanced general understanding of the effects of data element reduction on an 
aggregate and an individual level. 

The second of these simulations was conducted with a data base from which 
error has been eliminated. This simulation permitted modeling the joint effects of 
eliminating error as well as reducing the number of data elements in the Pell formulae 
for the first time. A comparison of these simulations has permitted a better 
understanding of the implication of error on the prevalent assumptions concerning data 
element reduction and the differences relating to specific effects. 

A word of caution should be offered concerning the interpretation of the 
findings. These findings are subject to the same caveats concerning the static nature 
of the data base discussed in the Introduction. Perhaps a more important caveat 
however, relates to the analyses. We have designed these analyses as an evaluation 
not as a forecast of likely policy outcomes. An example of this difference is evident 
in the assumptions underlying the imputation of error to the data base and the error 
free simulation. We assume in this imputation and simulation that all error found in 
Pell QC Stage HI is eliminated— even from the remaining data elements. Clearly, this 
is an unlikely assumption for a policy forecast. However, it is fundamental to our 
analysis from a research perspective and i as produced valuable results. 

O -50- 
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DESCRIPTION OF THE MODEL AND DATA BASE 



The program-wide simulations of full and five element formulae conducted ^or 
this report have been produced from the official ED simulation model (the applicant- 
based model) with which the Pell Grant Branch, DPPD, produced the data tapes for 
analysis. 

The applicant-based model is a micro-model of the Pell Grant Program designed 
to simulate for ED policymakers changes in awards and recipients under different Pell 
program parameters. The model uses a weighted sample of approximately 160,000 
actual Pell applicants. This data base w«^ used both in the program-wide simulation 
and the assessment of individual data elements. 

The model computes a Student Aid Index, or eligibility index for each applicant 
using the tell Grant family contribution schedule. It applies an imputed cost of 
attendance and enrollment status for each applicant and computes an expected award. 
Finally, the model applies a "show up rate" or estimation of the number of eligible 
applicai"-ts who will submit Student Aid Reports to postsecondary institucions and 
receive Pell Grants. The sample of applicants is weighted to produce estimates for 
the population of applicants and recipients. 

The Peil Grant Branch, DPPD, has produc ed several program-wide simulations of 
the 1982-83 academic year for this analysis. The baseline simulations, which 
replicate the 1982-83 year, have the following characteristics: 

• The 1981-82 data base aged to represent 1982-83 applicant da!:a 

• 1982-83 Pell Grant Program parameters 

$1,800 legislative maximum award/$ 1,800 maximum award 

"Taxation rates" on discretionary income of 11, 13, 18, and 25 
percent for dependent students increasing by income levels; 25 
percent for married independent applicants and 33 percent for single 
independent applicants with a family size of one 

Resource protection of $25,000 for home and an additional $25,000 
for other investments 



A-1 



All awards were reduced by about 6 percent to reflect validation savings. 
Therefore, the effective maxirrium award is less than $1700 and the 
minimum award is less than $100. 

A participation or no show rate j^ratified by income, was applied to all 
applicants to estimate the number of eligible recipients who actually 
receive Pell Grants* This accurately estimates the number of recipients, 
but reduces the overall number of applicants below actual le/els. 
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IMPUTAT!ON OF STAGE ID ERROR PATTERNS 
TO THE ED APrLICANT DATA BASE 



This appendix describes the statistical techniques used to assign "best value s'** 
to the applicant file. The purpose of the assignment procedure was to make possible a 
statistical simulation of the effects of program error rates on alternative eligibility 
formula. Statistical procedures used to assign bes values were designed to reproduce 
the patterns of reporting errors discovered in Stage HI of the Pell Grant Quality 
Control Project. This appendix consists of two parts: general approach and 
imputation procedures. 

GENERAL APPROACH 

The selection of a procedure with which to most accurately impute best values 
to the ED applicant data base received much attention and thought, and several 
approaches were conrldered and rejected before finally selecting a suitable approach. 
The objective of the selection process was to maximize the accuracy of the 
imputation. In order to do *o it would be necessary to capture those characteristics 
that were the greatest predictors of the probability and level of error for any single 
data element reported value (zero/non-zero), dependency status, income, and error on 
certain other variables. The approach;^'S considered included: 

• Statistical matching 

• Regression 

• Simultaneous interactions 

• "Cold decking'Vratio estimation 

• "Cold decking"/regression 

One of the most promising and yet straightforward approaches considered was 
statistical matching. Statistical matching is similar in approach to the commonly used 



♦"Best values," as used in this context, refers to application data values that have 
been determined to be correct through a variety of data collection techniques used in 
the Pell Grant Quality Control project. 
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proceduie of exact matching, which matcnes records from on-» source with records 
from another source using identifiers, such as social security nu.-nber, tr.v.* enable the 
linkage of data from two discrete sources. Statistical matching links records from one 
data source with a second, similar source by minimizing a specified distance function. 
(Radner, et al., 1980) statistical matching is widely used in the preparation, 
manipulation, and analysis of large scale data bases, for example Census surveys. 
(Radner, 1983) Matching is often used to impute or assign missing data values to cases 
on one data base (a recipient) by searching a second data base (a donor) and identifying 
a donor case that is closest to the case (a recipient) across specified dimensions (e.g., 
other data values or characteristics) and assigns the val ^e of the item from the donor 
to the recipient case. 

Two types of matching are commonly recognized. The first is unconstrained 
matching, which pUces no restrictions on the number of records that are matched 
from the recipient to the donor file. (Okner, . 972) This approach has several 
weaknesses, which resulted in our rejecting it as an acceptable approach. 

With unconstrained matching both the mean and standard deviation of the 
estimated variables in the recipient file may differ from the corresponding statistics 
in the donor fUe. Unconstrained statistical matching has the advantage of permitting 
the closest possible match for each recipient record, but at the cost of increasing the 
sample variance of estimators involving the estimated iriables. An unconstrained 
match amounts to taking a simple random sample with replacement of the records in 
the donor file. Thus, the distributions of the imputed va lables added to the recipient 
file are distributions of the selected sample rather than the distributions as bserved 
in the recipient file. (Rogers, 198^) For these reasons, we r tjected unconstrained 
matching as an approach to error imputation. 

The second type of statistical matching, constrained matching, held more 
promise as a method. (Barr and Turner, 1980) Constrained matching ensures that each 
donor file record is matched with a recipient file record by duplication of recipient 
file records, if necessary. The advantages of a constrained match are that the 
multivariate distribution of the imputed variables identically match the distribution in 



♦The reader is cautioned not to confuse the concept of donor and recipient used here 
with the Pell Grant Recipient file and the Pell Grant Applicant file. 
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the donor file as do the mean and standard deviation. A disadvantage includes the 
limitation that matched pairs (from both files) potentially differ more with respert to 
common values than an unconstrained match. The most significant disadvantages are 
that procedjres that minimize the differences between paired cases require 
considerable computer time, particularly for large data sets, and potentially result in 
an expanded data set. (Rogers, 198*) This Dosed serious time, resource, and 
computational problems, and led to the rejection of this approach. 

Another approach to imputation considered was regression. Regression would 
allow extrapolation of error data beyond the recipient file, a key issue since the 
applicant file contains data values in excess of the recipient file (e.g., ACI). This, 
however, was rejected because it would assign a small amount of error to all cases and 
would not capture the incidence of error and the full impact of this error on individual 
eligibility and awards. 

A procedure of mapping the simultaneous interactions of all errors wa6 

considered. This would precisely replicate the error patterns including the level and 

interaction among errors. It was not considered feasible, since the complexity would 

have outstripped the computer resources and quickly exhausted the degrees of freedom 

on the Stage III file. Allowing interaction among the 18 variables, zero and non-zero 

18 

reported value, error and no error, yields over 68 billion (* ) combinations. 

Thus, we considered and adopted a "cold decking" process for cases without 
dependency status error that stratified the Stage III file on reported value (rero/non- 
zero), dependency status and income. The probability of error was computed for each 
stratum. The issue of estimating best values was more difficult. We considered a 
ratio estimator that, not unlike a regression coefficient, would permit extrapolation of 
best values beyond the range of recipient reported values. The ratio estimator had 
two flaws. First, and perhaps most serious, a ratio estimator is inappropriate and 
ineffective with zero reported values (since zero multiplied by anything is zero), and 
error patterns were highly dependent on reported ' aiue (zero/non-zeru). 

The ratio estimator also li:nited the prediction of best value of a single variable 
to the reported value of thit variable and could not account for simultaniety of errors. 
Because of these limitations we replaced the ratio estimator with multivariate 
regression models, although we continiied to use a ''cold decking" procedure stratified 
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in income, reported values, and dependency status. This multivariate regression 
allowed us to control for the simultaniety of related errors as well as zero/non zero 
reported values. This is described below. 

The cold decking technique employed to assign an application to an error status 
is cur»-c.itiy used by Vital Statistics for estimating out-of-wedlock birth rates, by 
NCES in its primary and secondary school surveys, and by NCHS for its fetal surveys. 
Formal statistical analyses of the cold-deck approach can be found in Schaible (1979), 
Brewer (1979) and Oh and Scheuren (1981). 

Under the cold-deck approach the applicant file was first stratified into eight 
groups: 



Dependent students with total family incomes up to $8,000 

Dependent students with total family incomes between $X. 0 and $15,000^ 

Dependent students with total family incomes between $15,000 and $20,000 

Dependent students with total family incomes over $20,000 

Independent students with incomes up to $2,000 

Independent students with incomes between $2,000 and $^,000 

Independent students with incomes between $^,000 and $8,000 

Independent students with incomes over $8,000 



Probabilities for various combinations of error patterns for each strata were 
estimated from Stage III verified student data. A pattern was defined by the presence 
or absence of error on each of 18 verified application items. 

The patterns were found to depend on whether the reported value was zero. 
Each variable was subset into zero and non-zero subgroups. For each variable within a 
stratum there are then four possible events: 
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• Reported value zero, no error 

• Reported value zero, error 

• Reported value not zero, no error 
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• Reported value not zero, error 

As previously discussed, allowing interaction among the IS variables, which 
would exactly reproduce the Stage III error patterns including simultaniety, yields oyer 
68 billion (*^^) possible error patterns for each stratum. To reduce the complexity of 
the error patterns, several assumptions were made based on simultaneous error 
patterns found in Stage III data. The presence of error on adjusted gross income, 
nontaxable income, and net home value were assumed to be dependent of each other, 
but independent of the presence of error on all other data items. Similar relationships 
were assumed for family size and number in college and for dependent student's 
income and dependent student's assets. The presence of error on the remaining 1 1 
data items was assumed to be independent of the presence of error on all other data 
items. Thus, the number of error patterns within each stratum w?s reduced to 140 
* (2 X 4^) + (U X*)). 

Error patterns were assigned to applications with probabilities proportional ^ 

m 

their occurrence within the strc.ta. For every variable in the pattern assigned that 
contained no error, the reported value was assumed to be the best value. For variables 
assigned to an error status the best value was computed as a linear function of the 
reported value and other variables shown in Stage III to be predictive error values. 
The formula used was: 

T = A. + B + E, 
1 1 

where: 

T is a nxl vector imputed best (true) values 

B is a pxl vector of coefficients associated with app'' 'ation variables and 
an intercept term and estimated using OLS procedures with Stage III data 

Ai is a nxp matrix of application values predictive of true values and 
including the reported value on the variable being imputed 

E[ is a nxl vector of random, normal deviates with an expectation of O and 
a variance equal to the observed residual variance from the Stage III data. 

A separate equation was estimated for each of the IS variables to be imputed in 
each of the 8 strata for a possible total of equations. Strata were collapsed for 
some variables due to small degrees of freedom. 



B-5 7i> 



Given assumptions of linearity within the parameters, a normal distribution of 
errors and E(B/Recipients) = E(B/applicants) then Ericson (1969), Royail (1970) ..nd 
Cochran (1977) have shown that AiB is the maximum likelihood estimate of T within a 
stratum. We added E to A[B to reproduce the observed within strata variance vvhile 
preserving the unbiased expectation of Tj; 

Because E(Ei) = O and given the assumptions above; 
E(AiB) = Ti 

Therefore EiA^b + Ei) = Ti 

Regression mode's for family size and number in college did not provide 

sufficient predictive results. The joint distribution of best family size and best 

number in college conditioned on reported dependency status, reported family size, 

and reported number in college was determined for the recipient data base. This 

distribution was then imputed to the applicant data base. The following examplp 

t 

illustrates this procedure for a selected combination of dependency status, reporte'd 
family size, and reported number in college. 

Dependent Students Reporting Family Size of Four 
and Two Enrolled in Postsecondary Education 

Distribution of Best Values 



Family Size 


1 


Number in College 
2 


3 


Total 


2 




1.2 


0 


3.6 


3 


13.7 




.2 


19.6 


ft 


8.3 


59.6 


.6 


63. 5 


5 


.6 


5.9 


0 


6.5 


6 


0 


.6 


.6 


1.2 


7 


0 


0 


.6 


.6 


Total 


25 


72 


3 


100 
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Whenever a student on the applicant file reports as dependent with a family size 
of four c ^d two in college, best family size and best number in college were assigned 
using the probabilities given in the cells of the table. Similar distributions were 
determined and used for each combination reported dependency status, family size, 
and number in college. 
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The cold-deck procedures described above are inappropriate for determining 
eligibility for applicants that repor: they are independent but who are, in fact, 
dependent* For such dependency status "switchers" it is necessary to impute ail 
parental income data. The imputations must recreate a pattern of relationships 
between all imputed variables. To this end, for independent to dependent switchers, 
we employed a "hot-deck" imputation procedure. 

In the hot-deck approach each switcher has a separately chosen set of family 
income variables imputed from among the "donor" values from dependent student 
applications. The hot-deck approach is currently in use in the Current Population 
Survey, Social Security Benefit Estimates, various Department of Energy Surveys, and 
is being tested on IRS Statistics of Income 1040 Series. Good theoretical discussions 
of hot-deck imputations can be found in Oh and Scheuren (1980), Welniah and Coder 
(1980), Chapman (1976) and Ernst (1980). 

Hot-deck imputations were conducted using a two stage process. First, a 
probability of dependency status switch was calculated. For each applicant a 
switching status (yes or no) was assigned with a probability proportional to the 
switching rate. 

Second, for each applicant assigned to a switching status a donor was selected 
from dependent applicants. The donor and recipient were matched by random 
selection with replacement. A similar approach was used for dependent to 
independent switchers. 

IMPUTATION/ASSIGNMENT PROCEDURES 

The accurate imputation of Stage III error to the applicant data base required 
systematic attention to numerous important details which occurred in three separate 
phases. First, analyriis of the frequency, simultaniety, and level of erroi on the Stage 
III data base was necessary. Second, development of imputation software was 
required. Lastly, tests for goodne^ss of fit were required to assess the accuracy of the 
imputation. Each of these phases is treated in the following sections of this appendix. 
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Analysis of Stage III Recipient Data 



Data from the Stage III study were analyzed to determine the distribution of 
errors. This analysis involved three steps. The first step determined which cases had 
dependency status error. The second step determined which students had error in each 
variable. The third determined the degree of error for each variable. 

Dependency Status Error. Dependency status error presented a unique problem 
and therefore was handled separately from all other errors. The following table 
summarises the frequency of the two types of dependency status error found in the 
Stage III data. 



Characteristics 



• Students reporting as independent, 
unmarried, and living alone 

f Students reporting as independent 
and married or family size greater 
than one 



Percentage of Cases 
with Dependericy Status Error 

16.9% 



8.5% 



• Students reporting as dependent 



.6% 



These error rates were later imputed to the applicant file. 



Cases selected as aependency switchers were handled differently than all other 
cases. Reported data and "best" data are unrelated for switchers. For example, 
students who report as independent report their own adjusted gross income. The "best" 
adjusted gross income for a student who switches to dependent is his parents' adjusted 
gross income which was not reported. 

For each applicant selected as a switcher, a switcher (in the same direction) was 
randomly selected from the Stage III data base with replacement. The best values 
from the "donor" were then mapped onto the applicant record. No additional 
imputation procedures were luired for dependency status switchers. 

Presence or Absence of Error. For each variable, probability tables giving error 
rates conditioned on strata and zero/non-zero reported values were produced. These 
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error rates were later used to impute error to applicants. As stated earlier, the 
presence or absence of error was assumed to be interdependent for some variables. 
Joint distributions of error were determined for these variables, again conditioned on 
strata and zero/non-zero reported values. 

Degree of Error* For all but three of the eighteen variables, regression 
equations were determined to explain the degree of error. Student marital status was 
treated as a dichotomous variable (married/not married). Thus, if a case is determined 
to have an error in student marital status, the best value is the complement of the 
reported value. 

Family size and number in college are discrete variables for which regression 
equations with sufficient prediction ability could not be determined. Instead, the joint 
distribution of best values for family size and number in college conditioned on 
respective reported values was determined. This joint distribution, given in Table C-1 
of Appendix C, ' s later Imputed to the applicant file. 

Regression equations using Ordinary Least Squares (OLS) estimation were 
determined for each of the fifteen remaining variables within each stratum. Strata 
were collapsed for some variables to ensure sufficient degrees of freedom. For each 
variable, only Stage HI cases with error in that variable were used in estimating the 
regression equations. The dependent variable in each regression was the computed 
best value. All explanatory variables were reported values or functions of reported 
values. In general, income and asset variables along with the reported value were used 
to explain the best values. 

"Dummy" variables were used to explain the effects of zero reported values in 
the explanatory variables on best values. For each variable, a "dummy" variable was 
assigned. The "dummy" takes on the value zero when the variable it describes was 
zero, and a value of one otherwise. 

Table C-2 of Appendibc C lists the regression equations determined by OLS for 
each variable. Variables were stratified as shown in the table. Dependency status is 
g^ven at the top of the page. The equations are grouped by dependent variable. Rows 
and columns represent income levels and explanatory variables, respectively. Each 
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cell contains the OLS estimator for the regression coefficient for its respective 
income level and explanatory variable. 

The column labeled "INTERCEPT" gives the OLS estimate of the best value when 

2 

all other explanatory variables are zero. The column labeled "R-SQUARE" (R ) gives 
a measure of how well the equation explains the variance in the dependent variable, 
R^ is the ratio of variance explained by the regression equation to the total variance. 
An R^ of one would indicate a perfect fit of the data to the equation, A zero R 
would indicate that the equation explains none of the variance. 

Imputation Software 

The Statistical Analysis System (SAS) was used for all imputation software. The 
statistical procedures and file management capabilities of SAS were conducive to the 
imputation process, 

t 

! 

Dependency Switchers. The first step in the production of software was to 
separate the Stage III data base into three separate files: 

• Independent to dependent switchers 

• Dependent to independent switchers 

• Nonswitchers 

A SAS program was written to compare reported dependency status to best 
dependency status for each Stage III Pell recipient and to place each case into the 
appropriate file* This program also used the SAS procedure 'TREQ" to produce a table 
giving the rates of dependency status errors. These rates were then used to produce 
code to select switchers for the imputation of dependency status error. 

The switcher program stratifies applicants into three groups using reported 
values: dependents, unmarried independent living alone, and all other independents. 
The program then generates a random number from a uniform distribution between 
0 and 1 (U (0,1)) for each case. If this random number is less than or equal to the 
corresponding error rate, the case is selected as a switcher. If a case is not selected 
as a switcher the best dependency status is the reported dependency status, Fo: 
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switchers, the best dependency status is the complement of the reported dependency 
status. 

The program then assigns best values to switchers. Switchers are divided into 
two groups: independent to dependent and dependent to independent. The Stage III 
records within each of the two switcher files are arbitrarily numbered 1 through n, 
where n is the number of cases in the file. A random integer J is generated from a 
U (l,n) distribution for each switcher. The applicant switcher is then assigned all best 
values from the Jth record on the appropriate Stage III switcher file. The imputation 
process is then complete for switchers. 

Error Rates. Secondly the file containing nonswitchers was input to FREQ to 
produce tables of error rates for each variable. These rates were stratified by 
reported dependency status, income, and reported zero/not zero. The FREQ procedure 
also produced a disk file containing error rates for each variable within each stratum. 
The disk file of error rates was then input to a code generator (written in SAS) whic^ 
produced the software to impute error rates. 

The error rate imputation software determines to which stratum each case 
belongs and assigns the appropriate error rate for each variable. The program then 
generates a random number from a U (0,1) distribution. If the random number is less 
than or equal to the error rate the case is chosen to receive error. Otherwise, no error 
is assigned to the case for that variable. For each case not selected to receive error 
on a particular variable, the reported value is taken as the best value ana the 
imputation process is complete for that variabl* within the case. 

Best Values. The SAS procedure REG was used to obtain regression equations for 
each variable within each stratum. The REG procedure produced tables giving 
estimated regression coefficients and other statistics for each variable from the Stage 
III data base. Only those cases in error for a variable were used in determining the 
regression equation for that variable. The tables allowed us to make decisions about 
which strat? (if any) to collapse to ensure sufficient degrees of freedom. After 
redefining ^e strata, REG was run again on the Stage III data. This iteration of REG 
produced both tables and a disk file containing regression coefficients for each 
variable within each stratum. The regression equations are given in Table C-2 of 
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Appendix C. The coefficients on the disk file were run through a code generator which 
produced the best value imputation software. 



The best value imputation software assigned each applicant a regression equation 
for each variable for which the applicant was selected to have error. The equation 
assigned was dependent upon the applicant's stratum. The best value was then 
computed as the sum of the products of all regression coefficients with corresponding 
reported or "dummy" values. The concept of dummy variables was discussed earlier. 
For those cases not selected for error, the best value was set to the reported value. 

Final Merge. The applicant switchers and nonswitchers with best values 
replacing reported values were merged onto one file using SAS. This new file was 
formatted identically to the original applicant file so as to be compatible with ED's 
applicant based model. 

Software Validation 

Several measures were taken to ensure quality in imputation software. All code 
was manually reviewed by the programmer and by other analysts. Code generators 
were u-sed to reduce the probability of syntax errors. Code produced from generators 
was thoroughly checked. Imputation software was tested on Stage III data before using 
on applicant data base. 

Testing of Dependency Status Software. The Stage III data base was treated as 
if it contained applicant data and ^vas input to the dependency status software. The 
frequency of imputed dependency status error was then compared with the frequency 
of actual dependency status error. The best values mapped to the switchers were 
compared to the "donor" values. These measures ensured that the dependency status 
software was logically correct and produced imputed data stochastically consistent 
with the original dependency status data. 

Testing of Error Rate Imputation Software. The Stage III file was again treated 
as if it contained applicant data to test the error rate imputation software. Imputed 
error rates were compared to actual error rates. The results confirmed that the 
imputation software yielded error rates consistent with actual error rates. 
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Testing of Best Value Software. Similarly the Stage III data was used to test the 
best value software. Mean imputed best values were compared by stratum to mean 
actual best values. Table C-3 of Appendix C displays the results of this comparisvo. 
These results confirm the validity of the best value software. 



Testing of the Final Merge. To ensure that the final data tape created from the 
imputation process was compatible with ED's model extensive checks were performed. 
The imputed data base was compared to the original applicant data base record by 
record to verify that the two data sets were identically sorted. Fields containing 
variables not affected by the imputation process were compired between the original 
and the imputed data base. Ranges of all items on the imputed data base were 
compared to the ranges of respective items on the original file. Hexadecimal dumps 
from both files were compared. All of these tests ensured the compatibility of our 
data base to ED*s model. 

Imputation of Error to Applicant Data Base 



The applicant data base was run through the programs described in the 
Imputation Software section. These programs replaced existing data items with 
imputed data. Dependency status error was assigned first. Cases selected as 
switchers received best values from Stage III 3rs" and were separated into a new 
file. Error rates were imputed next. Applicants were selected to have error at the 
rate of observed error in the Stage III data base for each variable. Best values were 
then assigned to these cases chosen to have error. Best values were computed by 
substituting reported values into regression equations obtained from Stage III data. 
Finally applicant switchers and nonswitchers were merged producing a file of imputed 
data. 



Goodness of Fit Tests 



The Stage III data base and the imputed applicant data base were compared to 
ensure that the distribution of error on the applicant file approximated the distribution 
of error on the Stage III file. Means of imputed and best values are displayed in Tabie 
C-3 of Appendix C. After having submitted our imputed data base to ED for 
recalculation of award, we found a savings of $213 million when error is eliminated 
from the applicant data base. This is comparable to the Pell QC Stage III study which 
estimated a savings of $220 million. 
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t.t«9<» 



ITAal III IMPUTED 
2,1177 

0. 0«oe 
••Ste* 

ta.osst 

S,t«l9 

1, tl77 

0,0000 



A»»l.ieANT IMPUTED 
PEKCENT 

2,'SMt 
2,aS80 
lO^TRSS 
t«.«8«S 

o.nft* 

9««*7«ft 

l,7tOS 
3,74«0 

l,0t7S 

0,Mt« 



if»0*TCO fAMILV IXZItft REPORTED • IN COLLECEaf 



•SItTi 



•tCITi 


STAIC III ACTUAL 


•TAtC III 


INRUTED 


n IN eOLLEOI 


REMCENT 


RfKeiNT 




1 


2«'20*I 




9.S7«S 


s 


t^iio* 




2.1909 


1 


S.ttot 




«,sott 


a 


S«7ttt 




S.229t 


t 


lt«9S7t 




t(.t2IO 


1 


t2«0M9 




7,9I*« 


t 


*«209t 




9.37*9 


1 


92«707t 




9S,7*)« 


s 


S«S*I9 




9,S7*S 


• 


t«2MI 




0«0000 


1 


t.tiot 




t,9791 



ARPLICANT IMPUTED 
RfRCENT 

2;«29S 
0««9RS 
S,27tS 
• JMl 
1 1 ^9*2S 
11.71t9 

6«aou 

9S.M7* 
3,2M9 

O.MTO 



RIRORTCD fANILT SIZES* RIRQKTED f IN COLLEaEsl 



•BCtTi 
'ANILY aiZE 



•iEST» 

« IN eOLlESI 



STAtC III ACTUAL 
REMCENT 



STAOC III INRUTCO 
RfRCENT 



t 




0,0000 


a 


t.*«2* 


0,0000 


t 


S««*97 


9,a9«9 


9 


t ,7732 


0,0000 


t 


t,IOI9 


o«oooo 


a 


t*,2«90 


to, tots 


s 


«,S70* 


IO,«ORt 


t 


t,*«2* 


t.stu 


a 


9«i*a3 


S,*SM 


s 


A7,*e7tt 


««,0«0« 


• 


t,M«t 


t,ttu 


a 


9.«ni 


«,0'I0« 


1 


l,739«t 


9 



ARPLICA/4T IMPUTED 
PERCENT 

1^8M2 
l«7ft32 
3«a*S9 
t ,M«2 
3,1300 

«9,M90 
•,0880 
1,88«2 
<i,8«8« 

«T aiai 
2,3300 
9.1008 

2,0l9tt 



RERORTEO FANILV 8XZER8 RCl*OMTtD • IN COLLEaEs* 



•8CfTi 

RANIIT SIZE 




•BfiTt 

• IN eOLLEOC 

t 
a 
1 

9 

t 



4TA0E III ACTUAL 
REVCENT 



c-<» 



8«8St7 
8,8St7 

i3,n7t 

7,tt«l 
8,8317 
t9,Tl«8 
OS.StSf 



STA8E XII IMPUTED 
RERCENT 



86 



0,0900 
t«,2897 
21 ,8286 
7,t82t 
0,9999 
7,M2t 
90,999« 



ARRLICANT IMPUTED 
RERCENT 

7;228t 
9,7I2« 
18,3739 
8,8189 
9,7228 
I3,99«a 
82,77tt 



TABLE C-1 



JOINT DISTRIBUTION OF "BEST" FAMILY SIZE AND 
"BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

OIMNO£WT ITUOCNTI 
ttl^OHTID fknili tllff MfOttIO i IN eOlLCOCsl 



'iCIT' 

f IN eaueee 

1 
I 
1 
1 
} 
1 
t 
1 



STAOI III ACTUAL 

S«tOM 
S,|S1« 
1S,*00« 

•.itSO 



ITAal II! IM^UTIO 
PlffCCMT 

I.TTTt 
11,1111 

I.TTTt 
U.***T 
U.***T 

8.5SSS 

a.ssss 



T.dtSt 
T,S*SI 

i««toos 

S^ltTS 

s«,sest 

H,«JT5 

«.ise} 







•BfiT» 


0TA6C ttt ACTUAL 


STAOI 


tlx 


tMPuno 


rAMlLV 


StZI 


• IN eouioe 


MUCINT 










I 


t;i«Tf 






A, 001* 






1 


t^llOT 






2, 0*00 






( 








S,1*SS 






t 


0,1T«J 






A.OOU 






1 


is,«tTa 






a.i*si 






t 


2^00«T 






2.0000 






s 


i«OS«T 






0,0000 






1 


l,tT*2 






0,0000 






} 


SS,*T)i 






OS,S001 






1 


<.1S«0 






2.0«00 




10 


1 


t,l«TS 






2,0«00 




10 


s 


2.00«T 






2,OAao 








FAMILY iXZIsT RIPOHTIO f 


IN ,:0L.e6iBS 


•aciTi 




'•IIT» 


8TA0I IZt ACTUAL 


iTAOl 


ttt 


INPUTIO 


^AKILT 


StZI 


i IN eOLLEOI 


PliCINT 


PKCINT 





AP»LICAMT t»*»UTtO 
PCRCINT 

1 ^'s«o* 

2,S0«1 

T.TS*1 

is,a*T« 



1 

2 
S 
2 
S 
2 
I 
9 
• 



2««ltT 
S«OOM 

2.o«*a 

1S«««0* 
10^T9«f 
0,00*0 

2,0092 

S.OTIT 



2.0S10 
T,t«A7 

0. 0000 
10,S2*S 
10,1211 
IS, TIM 
JO, •TIT 

2,0310 

1, *310 



?,2«M 
1 ,MSO 

1 ,s«o* 

Stt,2SSS 
2,0T22 
I ,TT»2 
1.0930 



APPLICANT IMPUTtO 
PCPCINT 

2;S2S0 
S,a20« 
1,310* 
U,a«i2 
a,oaP* 

«,0T92 
9t,9«0T 
]»,4S3* 
i,§3P0 



8/ 



ERIC 



C-5 



BEST COPY AVAILABLE 



TABLE C-1 



30INT DISTRIBUTION OF "BEST" FAMILY SIZE AND 
"BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

OfffMOCNT ITUOINTI 

.••••••••••••^••••••••M. ffCVOIITIO rAMILT lIZffT KtPOKTCB i IN eOt.LE6fi« 

•ICIT* •8CITI mat III ACTUAL ITAQl HI XMVUTeO AMLXCANT IMfurfD 

FAMILV tlZf • IM COLLffit VCReiMT ^irccnt nnctHf 

7 1 20,}«3* 0 18,8811 

7 • AO.SUS 88 81.2987 

RCPORTio rtnti.i $nu7 n^onito • ih coLLCOCtf •••••••••••••••••«.., 

'SflTi •8ltTi StAtI III ACTUAL 8TA8I 121 IMPUTED APPLICANT IM»UrCO 

fA tLY aiZf • IN eOLLROC PCRCfNT PfRClNT »C8CENT 

7 • 100,000 100 100,000 

.«•••••••••••••••«•.••..• RCPOiTCO fANlLY l|ZEt7 RIPORTEO • IN COLLSGCa* ..o.................. 

• sriT* •BtiTi sTAac It: actual itaoi ixx iMPurts amltcant i**PurtD 

r«M|LV Site * IN eOLLIOI PfRCfMT PIRCCNT PI8CCNT 

S t 100,000 100 100,000 



8S 

BEST COPY AVAILABLE 



TABLE C-1 



BESI COPY AVAILABLE 



30INT DISTRIBUTION OF "BEST* FAMILY SIZE AND 
"BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 



OCNMOINT ITUBINTI 



••CtT> 

FAMILY tIZC i IN eouisc 

ft 1 

T I 

• t 

• I 

• 1 

• t 



snot !XX ACTUAL 
KRCINT 

lO.SlSl 



iT*af in ixmio 

PCRCCNT 

o.ooeo 



Jt»»UieANT IM»UTCD 
POCfNT 



9««i99 
2a,tlSt 

1 t,003k 

■«1tlt 

•.rtsi 



KEPOIITIC FAHlLf llZIit «|P0«TI9 • IK SOLLlBfat 



VAMILV tlZC 

T 
T 
T 
• 



•eciTi 

M IN eOLLfCC 

t 
t 

s 
1 
2 
1 
t 
2 



ITAII III ACTUAL 
PCRCfNT 

2««mt 

■ .20M 
If.OlM 

12.0201 

s.ooit 



STA8C III INPUTtO 
MRCfNT 

0 
10 

0 
0 



APPLICANT INPUTIO 
PtRCCNT 

sJsooo 

12,0119 

10,T«2 
c,07oa 
S.S217 

1T,0010 
2,0792 

11, 0^71 

s.o««s 



RIPQUTfO fAMILY OlZCiO RfPQRTCO i IN lOLLCBfsS 



•ICtT* 

PANILY SIZf 

s 

9 
S 
0 

T 
T 
0 



••EST! 

■ IN eoLLiac 

1 
1 
I 
1 

2 
S 
« 



OTAOf lit ACTUAL 



AOtO 
«««0 



0,001 0 

o,ooi« 

11,9007 

i 1 .atoo 
J7.i002 



ITAOf III INPUTfD 
PIXetNT 

0.0000 
7,1«20 
t«.2t97 
0.0000 
21.1200 
ia.2097 
42,0971 



AOOLICANT INOUTID 
•€»CI»'T 



7^0091 
0,1209 
7,tiao 
9,39S0 
t2,OtS2 
22,tSM 
10:5179 



OfPOfftlD PANILV lIZfM OCPOiTlD ■ 'N CQLLCetti 



iOtSTi •ICIT* 

PAM^'.v iizf 0 IN eoLLcei 

0 2 

9 1 

0 1 

T 1 

• 1 

• 0 



STAOe III ACTUAL 
•liClNT 

0:9020 
10,1021 

0,T000 
10,1000 
20,2atl 
11 .0000 



STAQf III 
OfRCCNT 



INPUTfD 



0 
■0 

10 

9 

to 

20 



AOOLlCANt INPUTCB 
PCOCCNT 



19«iaM 
22,ail0 
n,40ii 

1«,0M9 
to.iaao 
lO.alOO 



••••< 



.•••••o......—.— MIPORTID PAHILY SlZlit »IPO«TID ■ IN COLLfOC>9 



•■CIT* 

« IN eOLLlll 



ITAOf tlx ACTUAL 
^,.7 100.000 



•TAOf lit INOUTtO 
Mie2NT 

100 



APPLICANT IMUTIO 

riiciNT 



100.000 



TABLE C-l 



JOINT DISTRIBUTION OF "BEST" FAMILY SIZE AND 
"BEST- NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

OIKNOfNT ITUOINTI 



s 
s 

T 
f 



i IN eOlLCOC 

1 
1 
t 
t 
1 
I 
1 



STaOI til ACTUAL 

8,loa< 

10.8«10 



iTAse III tHvureo 

PfRClNT 

7,6*tl 
T,»*ll 
T,»»t3 
7.*«ll 

18.«*l9 
T.8»»l 

23.07*« 



A»»L1CAMT IKPUTfO 
•EiClMT 

8,'S1«1 
8.7*40 
7,«H7 
7,*ao« 

12,9841 
8.S14* 

28.S191 



«t»OiTCD FAMXLV SXZtM M^OHTIO • IN eOLLE8Ea8 



•8C8T« 

* 
• 
• 



•8|8T« 

i IN eoLLiec 

1 
1 
I 
t 



8TA8E XXX ACTUAL 

14«l*f 1 
14,1*91 
97.*4*9 



8TAff XXX IMVUTCO 

14.1897 
14.1897 
28.9714 
42.8971 



APPLICANT IMPUTCO 
»|tClMT 

12,919* 
11,1**0 
14,92M 
*1.9189 



KCPOiTIO VAMXLT SXZU* HKPORTEP i XN COLLlSEal 



•8«ST« 
fAMlLT SXZf 

7 
7 

f 



•8E8T« 

« IN COLLCCe 

1 
1 
1 

3 
4 



STACC in ACTUAL 
PERCCNT 



18,7884 
29,1*71 
18.7804 

21,4111 
20.1*08 



8TASC lit IMPUTED 
PERCENT 

8 
*8 

0 
28 
18 



applicant imputed 
pepcent 

It, 2771 
21,2891 
22«4«00 
17,2**1 
1«.*787 



WEPOiTED VAMILV 8tZE8f REPOPTCO 4 IN C0LLI6EP4 



•8E8r« 
PAMtLT SIZE 

8 
• 
• 



•8ESTI 

i IN COLLESE 

2 
S 
4 



STA8E ttl ACTUAL 
PERCENT 

24,9702 
2«,0088 
91.0218 



8TA8E III IMPUTED APPLICANT ImPutCO 
PERCENT PERCENT 



96 
0 
90 



27;m*2 
29,2874 
47.12*4 



REPORTED FAMILY 82ZE«« RfPORTEO • IN C0LLE8Ep9 



«8E8Tt 
PAMILT 8IZf 



•8|STt 

• IN COLLEGE 



8TA0B tZI ACTUAL 
PERCENT 



STASE III IMPUTED 
PERCENT 



APPLICANT IMPUTED 
PERCENT 



100.000 



too 



100.000 



ERIC 



c-8 



BEST COPY AVAILABLE 



TABLE C-1 



30INT WSTRlBUnCW OF -BEST" FAMILY SIZE AND 
"BEST* NUMBER IN COLLEGE BY RESPECTIVE 
REPOATEO VALUES 

DCMMOCiiT iruecNTa 



• MtT • 




•BCiT» 


ITASC HI ACTUAL STAQC HI I^PUTCD 




Size 


• IN eOLLfe'SC 






• 


t 






10 


t 


ii.^atT 100 








FAMILY sizf«io ffftoiiTCD i IN eoLL&:sc«a 






»010T« 


STAei lit ACTUAL ITAOC lit ZMPUTCO 




SIZI 


i IN eOLLtOt 


PtnecNT PCiteCNT 




s 


t 


S3^9M9 0.000 






1 


#4.i0Sl lOOtOOO 










•ICtTt 






STAOC tit ACTUAL STAOC Itt IMPUTED 




Size 


« IN eoLLCce 






• 


1 


19^112* 29 




10 


s 


7a,0S7a 7? 








r*MiLv itztito mPORTio • in eoueoiM 


•IC 




•ofir* 


STAOt Itt ACTUAL ITAOf Itt IMVUTtD 




size 


■ IH eOLLfOt 


PCKCCNT PlReCNT 




10 


• 


100,000 IflO 








VAMtLV StZClie ReVQRTfO • IN eOLLESIaO 


•BCtT« 




»8EiT» 


ITA8I Itt ACTUAL STAOf tit IMPUTfO 


f AMJtV 


size 


• IN eOLLfSf 


KRCCNT MRCCnT 




10 


* 


100.000 100 



APPLICANT iMPureo 

PeOCENT 

So.aaif 



APPLICANT IHPUTtO 
PfUCfNT 

T9,T9T0 



APPLICANT INPUTtS 
PCPCINT 



it^asso 

70,7011 



APPLICANT IMPUTED 
PERCENT 



100,000 



4PPLTCANT IMPUTED 
PCPCENT 



100,000 



91 



btSI COPY AVAILABLE 

C-9 



TABLE C-1 

30INT DISTRIBUTION OF "BEST" FAUILY SIZE AND 
"BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

OfMHOINT irUDINTI 
KEPORTCO fAMILV ItlCtll flCPOttTtO • IN COlLCeCtl 



laesTi •Bcin stasi it! actuai. staqc xxx xmputid tP»LTe*Nr xm»uteo 

fkmii^ tut ■ IM C0LLI6I •fUCfNT •liCCMT PKKCeNT 

f 1 S2^S810 6*,***7 )i«.'SYSO 

• t S9,3419 SS.S3S3 ai«***7 
tt SI, 0779 0,0000 2S,99tS 

REPGarEO rtMItV iXZtall KEVQKTIO • XM COLLESfvl ••••••••••••••••••••• 

iifiri >iEiTi iTAef xfi aeruAL iTAfB txx x»'»urED applxcant xm»uteo 

rAWXLV 8XZE ■ IN C0LLE6E PERCfNT PCKCENT PERCENT 

10 a S2j<o7s o«eoo ss^ssss 

tt t *7.09aT 100,000 ^k,kkk7 

REPORTED PAKXLV SXZCbII Rf PORTED • XN eOLLESSsS 

•8flT« ••EST! 8TA0E XXX ACTUAL ITAOE XXX XNRUTEO (MLXCANT X^RUTED 

PAMXL7 tXZE » IN COLLEGE PERCENT PERCENT PERCENT 

• i 4f|*l««0 100 90^0000 

11 S ^1.1920 0 90,0000 

••—•••••»••••••••••••••• REPORTED PANXLV aXZEslS RIPORTfO ■ XN COLLElftI •••••••••••••—••••< 

••CfT* •lEIT* STAae XXX ACTUAL 9TA0E XIX XNPUTEO APPLICANT IMPUTED 

PAMSLV SIZE i IN eOLLESE PERCENT PERCENT PERCENT 

10 9 100.000 loo 100.000 

REPORTED PANILV lIZEslS REPORTED • IN COLLESEtl ••••••••••••••••••••• 

iSEir* 'SEST' STAOE III ACTUAL ITAflE III INPUTfD APPLICANT IMPUTED 

FAMILY SIZE • IN eOLLBOC PERCENT PERCENT PERCENT 

tt t tOO.OOO loo 100,000 

RERORTED PANILY IIIlBlO REPORTED • IN COLLEtBil 

•SESTi •BEIT* STAGE IXX ACTUAL StAOE III IMPUTED APPLICANT IMPUTED 

PANILV SIZE ■ IN CQLLESZ PERCENT PERCENT PERCENT 

7 t 100.000 100 100,000 



9'^ 



ERIC 



C-10 



btSI COPY AVAILABLE 



TABLE C-1 



30INT DISTRIBUTION OF "BEST" FAMILY SIZE AND 
"BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

INOCMNOCNT iTUOINTt 



<8CIT< 
fkmiii itZI 

1 
I 
I 

s 

• 

f 
i 
t 



•B|IT< 

i IN eOLLlSf 

1 
♦ 
t 
1 
t 
1 



8TA8C III ACTUAL 

o«assf 
o,ass9 

0«700« 

o«assf 
o.asss 



STAII III IWPUTfO 

I. am 

o.isat 
o.«aaa 

0.0000 
O.tSM 
O.liai 
O.SMT 



«*,S08« 



i«*«ai 

0.107a 

0,«TTa 
O.ITM 
0,5??? 

o,aa*o 

O.IOOO 



tCPOtTCD FAMILY llXCta ilPOHTIO « TN (^OLLiaiil •••• 



•8C«T« 
FAHtLV tlZI 

1 

a 
a 
s 
s 



••fiT« 

i IN eouiar 

1 
t 
a 
1 
a 



iTAtC lit ACTUAC 
PIRCfNT 

*;7lt7 

n,*as* 

t.MOO 

o.sTas 



8TA8I III IMPUTfO 

FfieiNT 

§,7*ta 

SS.SU9 
0.SM9 



APPttCANT IMPUTID 

prpeiNT 

•i^iai! 

•,oili 
• ,tMO 

o.9a?s 



•fPQiTIO FAMILY llina MPOiTIO M IN COLLCSIaa 



•BCStt 
FiMlLY Size 

1 
I 
8 
S 



•8I8TI 

• IN eOLLfOI 

t 
t 

k 



8TA0C III ieruAL 

WCINT 

ts^oo*s 

t«,89*8 

*s«7oas 



8TA8I III IMUTfO 
FflelNT 

18,7871 
•7,878? 



iPPLlCANT IMPurfB 
•fUCfNt 



10,1111 
**,1«17 
S.8««l 



9 J 



ERIC 



oLOl COPY AVAILABLE 



c-ii 



TABLE C-1 



JOINT DISTRIBUTION OF -BEST" FAMILY SIZE AND 
"BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

INOfPCNOINT ITUOIMTI 
KIPORTIO FAMUr lIZIIl MI^ORTCO • IN COLLCOCtt 



FAMILY SIXC 



•IIST* 

i IN eOUlECI 

t 
t 
t 
1 
1 
t 



STAG! ttt ACTUAL 

1,909* 
t«*AOt 
0.*StA 



araoc III tMPuTEo 

I.STTT 
0.9090 
VI. 8091 
t«ASI« 

i.im 

0.9909 



AMUICAMT tWPUTeO 



2.9**t 
1«SS«9 
«1«S9«I 
1«*«9« 

s,oti* 

9.7tSS 



nponno rAMiur sizcas ttifomo • in eocLCSEat 



•leiT* 

FAMILY ItZf 



••EtT< 

■ IN COLLEGE 



•TASI in ACTUAL 
MttCENT 



•TAfI in XMUTEO 
aiRClMT 



AMLICANT IMiUTED 
atlCENT 



t 
a 
t 
a 
t 
a 



•,0079 

a«,aa9A 

9a,ssn 

S,7919 

T.seao 



• 

ia 
aa 
•a 

9 

a 



s«fao9 
a,*«ai 
as.aota 
98,aias 
•.9an 
a;ias9 



«C*OaTED FAMILY aiZEiS KEaoaTEO a IN COLLESEaS 



•aaaT' 

rANlLV axzB 

t 
s 
s 

• 



•BEaTi 

a IN COLLEBB 

1 

a 
1 
a 



aTASE III ACTUAL 

aaacENT 

tsTaarB 
tT,aaia 
««,9a«a 



aTAOE III IMPUTED 
aCMCENT 

t*.***7 

SS.SS31 
99.9090 
0.990* 



Aa*L!CANT IMauTEB 

atacENT 

•,S790 
ta,7999 
9*, 3799 
ta.9999 



94 



ERIC 



C-12 



BEST COPY AVAILABLE 



TABLE C-1 



JOINT DISTRIBUTION OF -BEST- FAWLY SIZE AND 
•BEST" NUMBER IN COLLEGE BY RESPECTIVE 
REPORTED VALUES 

XNOlPfNOCNT ITUOINTI 



•IttTt 

t 
I 
S 
• 
« 
I 



•BCIT* 

■ It eOLLEfit 

I 
1 
1 
1 

a 

1 



•TAQI XXX ACTUAL 
!*l«eCMT 

1^199* 
I.OTIS 
• ,MM 
•S,STM 
9.«M1 
2«t90t 

i.iaas 



•TA8C XXX XMVUTEO 
PtRClNT 

l.2«Tt 

0.0006 

Ti,*«lT 
11.13*0 
S.STOI 
0.0000 



A»»UICANT IMtUTCD 



>«ssot 

l«Ol*l 
■,T18S 
•I,a09o 
f,OI«t 
l.9taT 
l.OMl 



M'ORTIO rAMXLV IXZtM fl|»OHTtO ■ XN COtLlSfti 
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